function load_data
Loads a CSV dataset from a specified filepath using pandas, with fallback to creating sample data if the file is not found.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
24 - 34
simple
Purpose
This function serves as a data loading utility that attempts to read a CSV file and provides informative feedback about the dataset's structure (shape and columns). It includes error handling to gracefully manage missing files by calling a fallback function 'create_sample_data()' to generate sample data instead of failing.
Source Code
def load_data(filepath='data.csv'):
"""Load the dataset"""
try:
df = pd.read_csv(filepath)
print("Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
return df
except FileNotFoundError:
print("Error: data.csv not found. Creating sample dataset...")
return create_sample_data()
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
filepath |
- | 'data.csv' | positional_or_keyword |
Parameter Details
filepath: String path to the CSV file to be loaded. Defaults to 'data.csv' in the current working directory. Can be an absolute or relative path. Expected to point to a valid CSV file readable by pandas.read_csv().
Return Value
Returns a pandas DataFrame object containing the loaded dataset. If the specified file exists, returns the DataFrame created from the CSV file. If the file is not found (FileNotFoundError), returns the result of create_sample_data() function (assumed to also return a DataFrame). The DataFrame structure depends on the CSV content or the sample data generation logic.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Assuming create_sample_data() is defined
def create_sample_data():
return pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Load data from default location
df = load_data()
# Load data from custom filepath
df = load_data(filepath='path/to/mydata.csv')
# Access the loaded data
print(df.head())
Best Practices
- Ensure the 'create_sample_data()' function is defined before calling load_data() to avoid NameError when file is not found
- Use absolute paths or ensure correct working directory when specifying custom filepaths
- Consider adding more specific exception handling for other pandas.read_csv() errors (e.g., parsing errors, encoding issues)
- The function prints to stdout which may not be ideal for production environments; consider using logging instead
- Validate the returned DataFrame structure matches expected schema before processing
- Consider adding parameters for pandas.read_csv() options (encoding, delimiter, etc.) for more flexibility
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function load_dataset 86.9% similar
-
function load_analysis_data 73.8% similar
-
function create_test_dataset 57.4% similar
-
function upload_data_section_dataset 53.8% similar
-
function explore_data 53.4% similar