function upload_data_section_dataset
Flask API endpoint that handles CSV file uploads for data section analysis, processes the file, extracts metadata, and stores it in the data section for persistence.
/tf/active/vicechatdev/vice_ai/new_app.py
4464 - 4527
moderate
Purpose
This endpoint allows authenticated users to upload CSV datasets to their data sections for analysis. It validates file ownership, checks for an active analysis session, processes the uploaded CSV file to extract metadata (rows, columns, preview), stores this information in the data section, and integrates with the data analysis service for further processing.
Source Code
def upload_data_section_dataset(section_id):
"""Upload dataset for data section analysis"""
if not DATA_ANALYSIS_AVAILABLE:
return jsonify({'error': 'Data analysis service not available'}), 503
user_email = get_current_user()
# Verify ownership
data_section = data_section_service.get_data_section(section_id)
if not data_section or data_section.owner != user_email:
return jsonify({'error': 'Data section not found or access denied'}), 404
if not data_section.analysis_session_id:
return jsonify({'error': 'No analysis session found. Create session first.'}), 400
if 'file' not in request.files:
return jsonify({'error': 'No file uploaded'}), 400
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'No file selected'}), 400
if not file.filename.lower().endswith('.csv'):
return jsonify({'error': 'Only CSV files are supported'}), 400
try:
import tempfile
from werkzeug.utils import secure_filename
filename = secure_filename(file.filename)
with tempfile.NamedTemporaryFile(delete=False, suffix='.csv') as tmp_file:
file.save(tmp_file.name)
# Process the dataset
result = data_analysis_service.upload_dataset(
session_id=data_section.analysis_session_id,
file_path=tmp_file.name,
original_filename=filename
)
# Save CSV data to data section for persistence
import pandas as pd
from smartstat_service import smart_read_csv
df = smart_read_csv(tmp_file.name)
csv_info = {
'rows': len(df),
'columns': len(df.columns),
'column_names': list(df.columns),
'preview': df.head(20).to_dict('records')
}
# Update the data section with CSV info
data_section = data_section_service.get_data_section(section_id)
data_section.csv_data = json.dumps(csv_info)
data_section_service.update_data_section(data_section)
# Clean up temp file
os.unlink(tmp_file.name)
return jsonify(result)
except Exception as e:
logger.error(f"Error uploading dataset: {e}")
return jsonify({'error': str(e)}), 500
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
section_id |
- | - | positional_or_keyword |
Parameter Details
section_id: String identifier for the data section where the dataset will be uploaded. Must correspond to an existing data section owned by the authenticated user.
Return Value
Returns a JSON response. On success (200), returns the result from data_analysis_service.upload_dataset containing upload confirmation and metadata. On error, returns JSON with 'error' key and appropriate HTTP status code: 503 if data analysis service unavailable, 404 if section not found or access denied, 400 if no analysis session exists or file validation fails, 500 for processing errors.
Dependencies
flaskwerkzeugpandastempfileosjsonlogging
Required Imports
from flask import request, jsonify
from werkzeug.utils import secure_filename
import tempfile
import os
import json
import pandas as pd
from smartstat_service import smart_read_csv
from services import DataSectionService
from data_analysis_service import DataAnalysisService
from auth.azure_auth import require_auth, get_current_user
Conditional/Optional Imports
These imports are only needed under specific conditions:
import tempfile
Condition: imported inside try block for temporary file handling during CSV upload
Required (conditional)from werkzeug.utils import secure_filename
Condition: imported inside try block for sanitizing uploaded filenames
Required (conditional)import pandas as pd
Condition: imported inside try block for reading and processing CSV data
Required (conditional)from smartstat_service import smart_read_csv
Condition: imported inside try block for intelligent CSV reading with encoding detection
Required (conditional)Usage Example
# Client-side usage example (JavaScript fetch)
const formData = new FormData();
formData.append('file', csvFile); // csvFile is a File object
fetch('/api/data-sections/section-123/analysis/upload', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + authToken
},
body: formData
})
.then(response => response.json())
.then(data => {
console.log('Upload successful:', data);
// data contains upload result with dataset info
})
.catch(error => {
console.error('Upload failed:', error);
});
# Server-side context (Flask app setup)
from flask import Flask, request
from services import DataSectionService
from data_analysis_service import DataAnalysisService
from auth.azure_auth import require_auth
app = Flask(__name__)
data_section_service = DataSectionService()
data_analysis_service = DataAnalysisService()
DATA_ANALYSIS_AVAILABLE = True
# The endpoint is then registered with decorators:
# @app.route('/api/data-sections/<section_id>/analysis/upload', methods=['POST'])
# @require_auth
# def upload_data_section_dataset(section_id): ...
Best Practices
- Always verify user ownership of the data section before allowing file uploads to prevent unauthorized access
- Ensure an analysis session exists before uploading datasets to maintain proper workflow
- Use secure_filename() to sanitize uploaded filenames and prevent directory traversal attacks
- Store files in temporary locations and clean up (os.unlink) after processing to avoid disk space issues
- Limit file uploads to CSV format only for security and consistency
- Extract and store dataset metadata (rows, columns, preview) for quick access without re-reading files
- Wrap file processing in try-except blocks to handle encoding issues, malformed CSV, and other errors gracefully
- Use smart_read_csv for robust CSV reading with automatic encoding detection
- Return appropriate HTTP status codes (503, 404, 400, 500) to help clients handle different error scenarios
- Log errors with sufficient detail for debugging while avoiding sensitive data exposure
- Consider implementing file size limits to prevent resource exhaustion
- Store only a preview (first 20 rows) rather than entire dataset to optimize storage
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function upload_analysis_dataset 86.3% similar
-
function upload_data 77.8% similar
-
function smartstat_upload_data 72.8% similar
-
function save_data_section_analysis 71.9% similar
-
function smartstat_upload_files 71.2% similar