🔍 Code Extractor

function smartstat_upload_data

Maturity: 56

Flask route handler that uploads CSV or Excel data files to a SmartStat analysis session, with support for multi-sheet Excel files and session recovery.

File:
/tf/active/vicechatdev/vice_ai/new_app.py
Lines:
4784 - 4898
Complexity:
complex

Purpose

This endpoint handles file uploads for statistical analysis sessions. It validates user permissions, manages session state (including recovery from database if session is lost), processes CSV and Excel files (with multi-sheet support), stores the data in the session, and persists metadata to the database. It's designed for a web application that performs statistical analysis on uploaded datasets.

Source Code

def smartstat_upload_data(session_id):
    """Upload CSV data to SmartStat session"""
    user_email = get_current_user()
    
    logger.info(f"Upload request for session {session_id}")
    logger.info(f"Available sessions: {list(smartstat_service.sessions.keys())}")
    
    # Verify session exists - if not, try to recreate from data_section
    session = smartstat_service.get_session(session_id)
    if not session:
        logger.warning(f"Session {session_id} not found - attempting to recover from database")
        
        # Find data section with this session_id
        data_section = None
        all_sections = data_section_service.get_user_data_sections(user_email)
        for ds in all_sections:
            if ds.analysis_session_id == session_id:
                data_section = ds
                break
        
        if data_section:
            # Recreate session
            logger.info(f"Recreating session for data section {data_section.id}")
            session = SmartStatSession(
                session_id=session_id,
                data_section_id=data_section.id,
                title=data_section.title
            )
            smartstat_service.sessions[session_id] = session
        else:
            logger.error(f"Could not recover session {session_id}")
            return jsonify({'error': f'Session not found and could not be recovered'}), 404
    
    # Verify data section ownership
    data_section = data_section_service.get_data_section(session.data_section_id)
    if not data_section or data_section.owner != user_email:
        return jsonify({'error': 'Access denied'}), 403
    
    if 'file' not in request.files:
        return jsonify({'error': 'No file provided'}), 400
    
    file = request.files['file']
    if file.filename == '':
        return jsonify({'error': 'No file selected'}), 400
    
    # Check file extension - support CSV and Excel
    file_ext = file.filename.lower().rsplit('.', 1)[-1] if '.' in file.filename else ''
    if file_ext not in ['csv', 'xlsx', 'xls']:
        return jsonify({'error': 'Only CSV and Excel files (.csv, .xlsx, .xls) are supported'}), 400
    
    try:
        # Save uploaded file
        import uuid
        from pathlib import Path
        filename = f"{uuid.uuid4()}_{file.filename}"
        file_path = Path(smartstat_config.UPLOAD_FOLDER) / filename
        file.save(str(file_path))
        
        # Check if Excel file with multiple sheets
        if file_ext in ['xlsx', 'xls']:
            from smartstat_service import read_excel_file
            
            # Get sheet information
            excel_info = read_excel_file(str(file_path))
            
            if excel_info['total_sheets'] > 1:
                # Multiple sheets - return sheet selection interface
                session.excel_sheets = excel_info['sheets']
                session.updated_at = datetime.now()
                
                return jsonify({
                    'success': True,
                    'requires_sheet_selection': True,
                    'sheets': excel_info['sheets'],
                    'file_path': str(file_path),  # Send file path for later sheet selection
                    'message': f'Excel file contains {excel_info["total_sheets"]} sheets. Please select one to analyze.'
                })
            else:
                # Single sheet - load it directly
                sheet_name = excel_info['sheets'][0]['name']
                sheet_data = read_excel_file(str(file_path), sheet_name=sheet_name)
                df = sheet_data['dataframe']
                session.active_sheet = sheet_name
        else:
            # CSV file - load directly
            from smartstat_service import smart_read_csv
            df = smart_read_csv(str(file_path))
        
        # Store dataframe in session
        session.dataframe = df
        session.updated_at = datetime.now()
        
        # Upload to SmartStat session (this will be updated to handle the df directly)
        result = smartstat_service.upload_data(session_id, str(file_path), dataframe=df)
        
        if result['success']:
            # Store dataset info in data section metadata for persistence
            # Get fresh data section and update metadata
            data_section = data_section_service.get_data_section(session.data_section_id)
            if not data_section.metadata:
                data_section.metadata = {}
            data_section.metadata['dataset_rows'] = len(df)
            data_section.metadata['dataset_columns'] = list(df.columns)
            data_section.metadata['upload_filename'] = file.filename
            if session.active_sheet:
                data_section.metadata['active_sheet'] = session.active_sheet
            data_section_service.update_data_section(data_section)
            
            return jsonify(result)
        else:
            return jsonify(result), 400
            
    except Exception as e:
        logger.error(f"Error uploading data to SmartStat: {e}")
        return jsonify({'error': str(e)}), 500

Parameters

Name Type Default Kind
session_id - - positional_or_keyword

Parameter Details

session_id: String identifier for the SmartStat analysis session. Used to retrieve or recover the session, verify ownership, and associate uploaded data with the correct analysis context.

Return Value

Returns a Flask JSON response. On success (200): {'success': True, 'requires_sheet_selection': bool, 'sheets': list, 'file_path': str, 'message': str} for multi-sheet Excel or standard success response from smartstat_service.upload_data(). On error: {'error': str} with status codes 400 (bad request/validation), 403 (access denied), 404 (session not found), or 500 (server error).

Dependencies

  • flask
  • uuid
  • pathlib
  • datetime
  • pandas
  • openpyxl
  • xlrd
  • werkzeug

Required Imports

from flask import request, jsonify
from datetime import datetime
import uuid
from pathlib import Path
from smartstat_service import SmartStatService, SmartStatSession, smart_read_csv, read_excel_file
from services import DataSectionService
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import uuid

Condition: imported inside function for generating unique filenames

Required (conditional)
from pathlib import Path

Condition: imported inside function for file path handling

Required (conditional)
from smartstat_service import read_excel_file

Condition: imported inside function when processing Excel files

Required (conditional)
from smartstat_service import smart_read_csv

Condition: imported inside function when processing CSV files

Required (conditional)

Usage Example

# This is a Flask route handler, typically called via HTTP POST request
# Example using curl:
# curl -X POST \
#   -H "Authorization: Bearer <token>" \
#   -F "file=@data.csv" \
#   http://localhost:5000/api/smartstat/abc123/upload

# Example using Python requests:
import requests

session_id = 'abc123'
file_path = 'data.csv'
auth_token = 'your_auth_token'

with open(file_path, 'rb') as f:
    files = {'file': f}
    headers = {'Authorization': f'Bearer {auth_token}'}
    response = requests.post(
        f'http://localhost:5000/api/smartstat/{session_id}/upload',
        files=files,
        headers=headers
    )
    
if response.status_code == 200:
    result = response.json()
    if result.get('requires_sheet_selection'):
        print(f"Select from sheets: {result['sheets']}")
    else:
        print("Data uploaded successfully")
else:
    print(f"Error: {response.json()['error']}")

Best Practices

  • Ensure UPLOAD_FOLDER directory exists and has proper write permissions before deployment
  • Implement file size limits to prevent disk space exhaustion (not shown in code)
  • Consider cleaning up uploaded files after processing to prevent storage bloat
  • The session recovery mechanism assumes data_section.analysis_session_id is properly maintained
  • File validation only checks extensions - consider adding content validation for production
  • The function stores file paths in session for multi-sheet selection - ensure proper cleanup
  • Session state is stored in memory (smartstat_service.sessions) - consider persistent storage for production
  • Error logging is comprehensive but ensure log rotation is configured for production
  • The function modifies data_section.metadata directly - ensure thread safety if using concurrent requests
  • For Excel files, openpyxl (for .xlsx) and xlrd (for .xls) must be installed
  • The dataframe is stored in session memory - monitor memory usage for large datasets
  • Authentication is handled by require_auth decorator - ensure it's properly configured
  • File cleanup after processing is not shown - implement cleanup logic to prevent disk space issues

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function smartstat_upload_files 88.2% similar

    Flask API endpoint that handles multi-file uploads (CSV, Excel, PDF, Word, PowerPoint) to a SmartStat session, processing data files as datasets and documents as information sheets.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function upload_analysis_dataset 78.6% similar

    Flask API endpoint that handles file upload for data analysis sessions, accepting CSV and Excel files, validating user access, and processing the dataset through a data analysis service.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function smartstat_select_sheet 78.6% similar

    Flask API endpoint that processes one or more Excel sheets from an uploaded file, validates them, categorizes them as datasets or information sheets, and adds them to a SmartStat analysis session.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function upload_data 76.6% similar

    Flask route handler that accepts file uploads via POST request, validates the file, saves it with a timestamp, and loads the data into an analysis session.

    From: /tf/active/vicechatdev/full_smartstat/app.py
  • function smartstat_save_to_document 73.9% similar

    Flask route handler that saves SmartStat statistical analysis results back to a data section document, generating a final report with queries, results, and plots.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
← Back to Browse