function smartstat_upload_data
Flask route handler that uploads CSV or Excel data files to a SmartStat analysis session, with support for multi-sheet Excel files and session recovery.
/tf/active/vicechatdev/vice_ai/new_app.py
4784 - 4898
complex
Purpose
This endpoint handles file uploads for statistical analysis sessions. It validates user permissions, manages session state (including recovery from database if session is lost), processes CSV and Excel files (with multi-sheet support), stores the data in the session, and persists metadata to the database. It's designed for a web application that performs statistical analysis on uploaded datasets.
Source Code
def smartstat_upload_data(session_id):
"""Upload CSV data to SmartStat session"""
user_email = get_current_user()
logger.info(f"Upload request for session {session_id}")
logger.info(f"Available sessions: {list(smartstat_service.sessions.keys())}")
# Verify session exists - if not, try to recreate from data_section
session = smartstat_service.get_session(session_id)
if not session:
logger.warning(f"Session {session_id} not found - attempting to recover from database")
# Find data section with this session_id
data_section = None
all_sections = data_section_service.get_user_data_sections(user_email)
for ds in all_sections:
if ds.analysis_session_id == session_id:
data_section = ds
break
if data_section:
# Recreate session
logger.info(f"Recreating session for data section {data_section.id}")
session = SmartStatSession(
session_id=session_id,
data_section_id=data_section.id,
title=data_section.title
)
smartstat_service.sessions[session_id] = session
else:
logger.error(f"Could not recover session {session_id}")
return jsonify({'error': f'Session not found and could not be recovered'}), 404
# Verify data section ownership
data_section = data_section_service.get_data_section(session.data_section_id)
if not data_section or data_section.owner != user_email:
return jsonify({'error': 'Access denied'}), 403
if 'file' not in request.files:
return jsonify({'error': 'No file provided'}), 400
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'No file selected'}), 400
# Check file extension - support CSV and Excel
file_ext = file.filename.lower().rsplit('.', 1)[-1] if '.' in file.filename else ''
if file_ext not in ['csv', 'xlsx', 'xls']:
return jsonify({'error': 'Only CSV and Excel files (.csv, .xlsx, .xls) are supported'}), 400
try:
# Save uploaded file
import uuid
from pathlib import Path
filename = f"{uuid.uuid4()}_{file.filename}"
file_path = Path(smartstat_config.UPLOAD_FOLDER) / filename
file.save(str(file_path))
# Check if Excel file with multiple sheets
if file_ext in ['xlsx', 'xls']:
from smartstat_service import read_excel_file
# Get sheet information
excel_info = read_excel_file(str(file_path))
if excel_info['total_sheets'] > 1:
# Multiple sheets - return sheet selection interface
session.excel_sheets = excel_info['sheets']
session.updated_at = datetime.now()
return jsonify({
'success': True,
'requires_sheet_selection': True,
'sheets': excel_info['sheets'],
'file_path': str(file_path), # Send file path for later sheet selection
'message': f'Excel file contains {excel_info["total_sheets"]} sheets. Please select one to analyze.'
})
else:
# Single sheet - load it directly
sheet_name = excel_info['sheets'][0]['name']
sheet_data = read_excel_file(str(file_path), sheet_name=sheet_name)
df = sheet_data['dataframe']
session.active_sheet = sheet_name
else:
# CSV file - load directly
from smartstat_service import smart_read_csv
df = smart_read_csv(str(file_path))
# Store dataframe in session
session.dataframe = df
session.updated_at = datetime.now()
# Upload to SmartStat session (this will be updated to handle the df directly)
result = smartstat_service.upload_data(session_id, str(file_path), dataframe=df)
if result['success']:
# Store dataset info in data section metadata for persistence
# Get fresh data section and update metadata
data_section = data_section_service.get_data_section(session.data_section_id)
if not data_section.metadata:
data_section.metadata = {}
data_section.metadata['dataset_rows'] = len(df)
data_section.metadata['dataset_columns'] = list(df.columns)
data_section.metadata['upload_filename'] = file.filename
if session.active_sheet:
data_section.metadata['active_sheet'] = session.active_sheet
data_section_service.update_data_section(data_section)
return jsonify(result)
else:
return jsonify(result), 400
except Exception as e:
logger.error(f"Error uploading data to SmartStat: {e}")
return jsonify({'error': str(e)}), 500
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
session_id |
- | - | positional_or_keyword |
Parameter Details
session_id: String identifier for the SmartStat analysis session. Used to retrieve or recover the session, verify ownership, and associate uploaded data with the correct analysis context.
Return Value
Returns a Flask JSON response. On success (200): {'success': True, 'requires_sheet_selection': bool, 'sheets': list, 'file_path': str, 'message': str} for multi-sheet Excel or standard success response from smartstat_service.upload_data(). On error: {'error': str} with status codes 400 (bad request/validation), 403 (access denied), 404 (session not found), or 500 (server error).
Dependencies
flaskuuidpathlibdatetimepandasopenpyxlxlrdwerkzeug
Required Imports
from flask import request, jsonify
from datetime import datetime
import uuid
from pathlib import Path
from smartstat_service import SmartStatService, SmartStatSession, smart_read_csv, read_excel_file
from services import DataSectionService
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
import uuid
Condition: imported inside function for generating unique filenames
Required (conditional)from pathlib import Path
Condition: imported inside function for file path handling
Required (conditional)from smartstat_service import read_excel_file
Condition: imported inside function when processing Excel files
Required (conditional)from smartstat_service import smart_read_csv
Condition: imported inside function when processing CSV files
Required (conditional)Usage Example
# This is a Flask route handler, typically called via HTTP POST request
# Example using curl:
# curl -X POST \
# -H "Authorization: Bearer <token>" \
# -F "file=@data.csv" \
# http://localhost:5000/api/smartstat/abc123/upload
# Example using Python requests:
import requests
session_id = 'abc123'
file_path = 'data.csv'
auth_token = 'your_auth_token'
with open(file_path, 'rb') as f:
files = {'file': f}
headers = {'Authorization': f'Bearer {auth_token}'}
response = requests.post(
f'http://localhost:5000/api/smartstat/{session_id}/upload',
files=files,
headers=headers
)
if response.status_code == 200:
result = response.json()
if result.get('requires_sheet_selection'):
print(f"Select from sheets: {result['sheets']}")
else:
print("Data uploaded successfully")
else:
print(f"Error: {response.json()['error']}")
Best Practices
- Ensure UPLOAD_FOLDER directory exists and has proper write permissions before deployment
- Implement file size limits to prevent disk space exhaustion (not shown in code)
- Consider cleaning up uploaded files after processing to prevent storage bloat
- The session recovery mechanism assumes data_section.analysis_session_id is properly maintained
- File validation only checks extensions - consider adding content validation for production
- The function stores file paths in session for multi-sheet selection - ensure proper cleanup
- Session state is stored in memory (smartstat_service.sessions) - consider persistent storage for production
- Error logging is comprehensive but ensure log rotation is configured for production
- The function modifies data_section.metadata directly - ensure thread safety if using concurrent requests
- For Excel files, openpyxl (for .xlsx) and xlrd (for .xls) must be installed
- The dataframe is stored in session memory - monitor memory usage for large datasets
- Authentication is handled by require_auth decorator - ensure it's properly configured
- File cleanup after processing is not shown - implement cleanup logic to prevent disk space issues
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function smartstat_upload_files 88.2% similar
-
function upload_analysis_dataset 78.6% similar
-
function smartstat_select_sheet 78.6% similar
-
function upload_data 76.6% similar
-
function smartstat_save_to_document 73.9% similar