function analyze_data
Flask route handler that initiates an asynchronous data analysis process based on user query, creating a background thread to perform the analysis and returning an analysis ID for progress tracking.
/tf/active/vicechatdev/full_smartstat/app.py
1102 - 1146
moderate
Purpose
This endpoint serves as the entry point for starting data analysis tasks in a web application. It accepts a user query along with configuration options (model selection, context inclusion, specific analyses), validates the input, creates a unique analysis ID for tracking, and spawns a background thread to execute the analysis asynchronously. This allows the API to respond immediately while the potentially long-running analysis continues in the background. The function is designed for multi-user scenarios where each session can have multiple concurrent analyses.
Source Code
def analyze_data(session_id):
"""Start asynchronous analysis process"""
try:
data = request.get_json()
user_query = data.get('query', '')
model = data.get('model', 'gpt-4o') # Default to gpt-4o
include_previous_context = data.get('include_previous_context', True) # Default to True
selected_analyses = data.get('selected_analyses', []) # Specific analyses to include
if not user_query:
return jsonify({
'success': False,
'error': 'No query provided'
}), 400
# Initialize progress tracking
analysis_id = str(uuid.uuid4())
analysis_progress[analysis_id] = {
'status': 'starting',
'progress': 0,
'message': 'Initializing analysis...',
'started_at': datetime.now().isoformat(),
'session_id': session_id
}
# Start analysis in background thread
thread = threading.Thread(
target=run_analysis_async,
args=(analysis_id, session_id, user_query, model, include_previous_context, selected_analyses)
)
thread.daemon = True
thread.start()
return jsonify({
'success': True,
'analysis_id': analysis_id,
'message': 'Analysis started'
})
except Exception as e:
logger.error(f"Error starting analysis: {str(e)}")
return jsonify({
'success': False,
'error': str(e)
}), 500
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
session_id |
- | - | positional_or_keyword |
Parameter Details
session_id: String identifier for the user session. Used to associate the analysis with a specific user session and retrieve session-specific data. This parameter comes from the URL route path (/analyze/<session_id>).
Return Value
Returns a Flask JSON response tuple. On success: (jsonify({'success': True, 'analysis_id': str, 'message': str}), 200) where analysis_id is a UUID string for tracking the analysis progress. On validation error: (jsonify({'success': False, 'error': str}), 400) when no query is provided. On server error: (jsonify({'success': False, 'error': str}), 500) for any unexpected exceptions.
Dependencies
flaskuuiddatetimethreadingloggingjsonospandaspathlibtempfilebase64timewerkzeugtyping
Required Imports
from flask import Flask, request, jsonify
import uuid
from datetime import datetime
import threading
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
from config import Config, config
Condition: Required if run_analysis_async function uses application configuration settings
Required (conditional)from models import DataSource, DataSourceType, AnalysisConfiguration, AnalysisType
Condition: Required if run_analysis_async function uses these model classes for data structure
Required (conditional)from services import StatisticalAnalysisService
Condition: Required if run_analysis_async performs statistical analysis operations
Required (conditional)from statistical_agent import StatisticalAgent
Condition: Required if run_analysis_async uses the statistical agent for analysis
Required (conditional)from enhanced_sql_workflow import EnhancedSQLWorkflow
Condition: Required if run_analysis_async performs SQL-based data operations
Required (conditional)from data_processor import DataProcessor
Condition: Required if run_analysis_async processes data before analysis
Required (conditional)from script_executor import ScriptExecutor
Condition: Required if run_analysis_async executes analysis scripts
Required (conditional)from dynamic_schema_discovery import DynamicSchemaDiscovery
Condition: Required if run_analysis_async performs schema discovery on data sources
Required (conditional)Usage Example
# Flask application setup
from flask import Flask, request, jsonify
import uuid
from datetime import datetime
import threading
import logging
app = Flask(__name__)
logger = logging.getLogger(__name__)
analysis_progress = {}
def run_analysis_async(analysis_id, session_id, user_query, model, include_previous_context, selected_analyses):
# Implementation of async analysis
pass
@app.route('/analyze/<session_id>', methods=['POST'])
def analyze_data(session_id):
# Function implementation here
pass
# Client usage example (using requests library):
import requests
response = requests.post(
'http://localhost:5000/analyze/user-session-123',
json={
'query': 'What is the average sales by region?',
'model': 'gpt-4o',
'include_previous_context': True,
'selected_analyses': ['descriptive', 'correlation']
}
)
if response.json()['success']:
analysis_id = response.json()['analysis_id']
print(f'Analysis started with ID: {analysis_id}')
else:
print(f'Error: {response.json()["error"]}')
Best Practices
- Always provide a 'query' field in the request body, as it is required for analysis
- The function uses daemon threads which will terminate when the main program exits; ensure proper cleanup if needed
- The analysis_progress dictionary should be thread-safe if accessed from multiple threads; consider using threading.Lock for production
- Monitor the analysis_progress dictionary for memory leaks in long-running applications; implement cleanup for completed analyses
- The default model is 'gpt-4o'; ensure API credentials are configured for the specified model
- Error handling returns appropriate HTTP status codes (400 for client errors, 500 for server errors)
- The function returns immediately after starting the thread; use the analysis_id to poll for progress via a separate endpoint
- Consider implementing timeout mechanisms for long-running analyses to prevent resource exhaustion
- Validate session_id exists and is authorized before processing the request in production environments
- The run_analysis_async function must handle its own error logging and update analysis_progress with error states
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function analysis_chat 77.4% similar
-
function get_analysis_progress 77.2% similar
-
function data_section_analysis_chat 75.2% similar
-
function smartstat_run_analysis 72.8% similar
-
function run_analysis_async 72.1% similar