🔍 Code Extractor

function analyze_data

Maturity: 50

Flask route handler that initiates an asynchronous data analysis process based on user query, creating a background thread to perform the analysis and returning an analysis ID for progress tracking.

File:
/tf/active/vicechatdev/full_smartstat/app.py
Lines:
1102 - 1146
Complexity:
moderate

Purpose

This endpoint serves as the entry point for starting data analysis tasks in a web application. It accepts a user query along with configuration options (model selection, context inclusion, specific analyses), validates the input, creates a unique analysis ID for tracking, and spawns a background thread to execute the analysis asynchronously. This allows the API to respond immediately while the potentially long-running analysis continues in the background. The function is designed for multi-user scenarios where each session can have multiple concurrent analyses.

Source Code

def analyze_data(session_id):
    """Start asynchronous analysis process"""
    try:
        data = request.get_json()
        user_query = data.get('query', '')
        model = data.get('model', 'gpt-4o')  # Default to gpt-4o
        include_previous_context = data.get('include_previous_context', True)  # Default to True
        selected_analyses = data.get('selected_analyses', [])  # Specific analyses to include
        
        if not user_query:
            return jsonify({
                'success': False,
                'error': 'No query provided'
            }), 400
        
        # Initialize progress tracking
        analysis_id = str(uuid.uuid4())
        analysis_progress[analysis_id] = {
            'status': 'starting',
            'progress': 0,
            'message': 'Initializing analysis...',
            'started_at': datetime.now().isoformat(),
            'session_id': session_id
        }
        
        # Start analysis in background thread
        thread = threading.Thread(
            target=run_analysis_async,
            args=(analysis_id, session_id, user_query, model, include_previous_context, selected_analyses)
        )
        thread.daemon = True
        thread.start()
        
        return jsonify({
            'success': True,
            'analysis_id': analysis_id,
            'message': 'Analysis started'
        })
        
    except Exception as e:
        logger.error(f"Error starting analysis: {str(e)}")
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500

Parameters

Name Type Default Kind
session_id - - positional_or_keyword

Parameter Details

session_id: String identifier for the user session. Used to associate the analysis with a specific user session and retrieve session-specific data. This parameter comes from the URL route path (/analyze/<session_id>).

Return Value

Returns a Flask JSON response tuple. On success: (jsonify({'success': True, 'analysis_id': str, 'message': str}), 200) where analysis_id is a UUID string for tracking the analysis progress. On validation error: (jsonify({'success': False, 'error': str}), 400) when no query is provided. On server error: (jsonify({'success': False, 'error': str}), 500) for any unexpected exceptions.

Dependencies

  • flask
  • uuid
  • datetime
  • threading
  • logging
  • json
  • os
  • pandas
  • pathlib
  • tempfile
  • base64
  • time
  • werkzeug
  • typing

Required Imports

from flask import Flask, request, jsonify
import uuid
from datetime import datetime
import threading
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

from config import Config, config

Condition: Required if run_analysis_async function uses application configuration settings

Required (conditional)
from models import DataSource, DataSourceType, AnalysisConfiguration, AnalysisType

Condition: Required if run_analysis_async function uses these model classes for data structure

Required (conditional)
from services import StatisticalAnalysisService

Condition: Required if run_analysis_async performs statistical analysis operations

Required (conditional)
from statistical_agent import StatisticalAgent

Condition: Required if run_analysis_async uses the statistical agent for analysis

Required (conditional)
from enhanced_sql_workflow import EnhancedSQLWorkflow

Condition: Required if run_analysis_async performs SQL-based data operations

Required (conditional)
from data_processor import DataProcessor

Condition: Required if run_analysis_async processes data before analysis

Required (conditional)
from script_executor import ScriptExecutor

Condition: Required if run_analysis_async executes analysis scripts

Required (conditional)
from dynamic_schema_discovery import DynamicSchemaDiscovery

Condition: Required if run_analysis_async performs schema discovery on data sources

Required (conditional)

Usage Example

# Flask application setup
from flask import Flask, request, jsonify
import uuid
from datetime import datetime
import threading
import logging

app = Flask(__name__)
logger = logging.getLogger(__name__)
analysis_progress = {}

def run_analysis_async(analysis_id, session_id, user_query, model, include_previous_context, selected_analyses):
    # Implementation of async analysis
    pass

@app.route('/analyze/<session_id>', methods=['POST'])
def analyze_data(session_id):
    # Function implementation here
    pass

# Client usage example (using requests library):
import requests

response = requests.post(
    'http://localhost:5000/analyze/user-session-123',
    json={
        'query': 'What is the average sales by region?',
        'model': 'gpt-4o',
        'include_previous_context': True,
        'selected_analyses': ['descriptive', 'correlation']
    }
)

if response.json()['success']:
    analysis_id = response.json()['analysis_id']
    print(f'Analysis started with ID: {analysis_id}')
else:
    print(f'Error: {response.json()["error"]}')

Best Practices

  • Always provide a 'query' field in the request body, as it is required for analysis
  • The function uses daemon threads which will terminate when the main program exits; ensure proper cleanup if needed
  • The analysis_progress dictionary should be thread-safe if accessed from multiple threads; consider using threading.Lock for production
  • Monitor the analysis_progress dictionary for memory leaks in long-running applications; implement cleanup for completed analyses
  • The default model is 'gpt-4o'; ensure API credentials are configured for the specified model
  • Error handling returns appropriate HTTP status codes (400 for client errors, 500 for server errors)
  • The function returns immediately after starting the thread; use the analysis_id to poll for progress via a separate endpoint
  • Consider implementing timeout mechanisms for long-running analyses to prevent resource exhaustion
  • Validate session_id exists and is authorized before processing the request in production environments
  • The run_analysis_async function must handle its own error logging and update analysis_progress with error states

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function analysis_chat 77.4% similar

    Flask route handler that processes chat messages for data analysis sessions, verifying user authentication and session ownership before delegating to the data analysis service.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function get_analysis_progress 77.2% similar

    Flask route handler that retrieves the progress status of a running analysis task and performs cleanup of completed/failed analyses after a timeout period.

    From: /tf/active/vicechatdev/full_smartstat/app.py
  • function data_section_analysis_chat 75.2% similar

    Async Flask route handler that processes chat messages for data section analysis, managing conversation history and integrating with a statistical analysis service.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function smartstat_run_analysis 72.8% similar

    Flask API endpoint that initiates a SmartStat statistical analysis in a background thread, tracking progress and persisting results to a data section.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function run_analysis_async 72.1% similar

    Executes a data analysis workflow asynchronously with real-time progress tracking, including query interpretation, script generation, execution, and result finalization.

    From: /tf/active/vicechatdev/full_smartstat/app.py
← Back to Browse