🔍 Code Extractor

class SmartStatSession

Maturity: 51

A session management class that encapsulates a SmartStat statistical analysis session, tracking data, analysis history, plots, and reports for a specific data section.

File:
/tf/active/vicechatdev/vice_ai/smartstat_service.py
Lines:
997 - 1061
Complexity:
moderate

Purpose

SmartStatSession manages the complete lifecycle of a statistical analysis session. It stores and organizes multiple datasets, information sheets, analysis history, generated plots, and final reports. The class supports both single dataframe (backward compatible) and multi-dataset workflows, tracks Excel sheet information, and provides serialization capabilities for session persistence and API responses.

Source Code

class SmartStatSession:
    """Represents a SmartStat analysis session linked to a data section"""
    
    def __init__(self, session_id: str, data_section_id: str, title: str):
        self.session_id = session_id
        self.data_section_id = data_section_id
        self.title = title
        self.created_at = datetime.now()
        self.updated_at = datetime.now()
        self.dataframe = None  # Primary/main dataframe for backward compatibility
        self.datasets = {}  # Dictionary of named datasets {name: dataframe}
        self.info_sheets = {}  # Dictionary of information sheets {name: context_text}
        self.analysis_history = []
        self.generated_plots = []
        self.final_report = None
        self.excel_sheets = None  # Store info about Excel sheets if uploaded
        self.active_sheet = None  # Currently selected sheet name
        
    def to_dict(self):
        result = {
            'session_id': self.session_id,
            'data_section_id': self.data_section_id,
            'title': self.title,
            'created_at': self.created_at.isoformat(),
            'updated_at': self.updated_at.isoformat(),
            'has_data': self.dataframe is not None or len(self.datasets) > 0,
            'analysis_count': len(self.analysis_history),
            'plots_count': len(self.generated_plots),
            'dataset_count': len(self.datasets),
            'info_sheet_count': len(self.info_sheets)
        }
        
        # Add primary dataframe details if available
        if self.dataframe is not None:
            result['rows'] = len(self.dataframe)
            result['columns'] = len(self.dataframe.columns)
            result['column_names'] = list(self.dataframe.columns)
            numeric_cols = self.dataframe.select_dtypes(include=['number']).columns.tolist()
            result['numeric_columns'] = numeric_cols
        
        # Add multi-dataset information
        if len(self.datasets) > 0:
            # Return as array for frontend compatibility
            result['datasets'] = []
            for name, df in self.datasets.items():
                result['datasets'].append({
                    'name': name,
                    'rows': len(df),
                    'columns': len(df.columns),
                    'column_names': list(df.columns),
                    'numeric_columns': df.select_dtypes(include=['number']).columns.tolist()
                })
        
        # Add information sheets context  
        if len(self.info_sheets) > 0:
            # Return as array with names AND content for frontend display
            result['info_sheets'] = [
                {
                    'name': name, 
                    'content': content[:2000] if len(content) > 2000 else content  # Limit to 2000 chars for display
                } 
                for name, content in self.info_sheets.items()
            ]
        
        return result

Parameters

Name Type Default Kind
bases - -

Parameter Details

session_id: Unique identifier string for the session, typically a UUID or similar unique string used to reference this specific analysis session

data_section_id: Identifier string linking this session to a specific data section in the parent system, establishing the relationship between session and data source

title: Human-readable string title/name for the session, used for display and identification purposes in user interfaces

Return Value

Instantiation returns a SmartStatSession object with initialized attributes. The to_dict() method returns a dictionary containing session metadata including session_id, data_section_id, title, timestamps, data statistics (rows, columns, column names), dataset information, and analysis counts suitable for JSON serialization and API responses.

Class Interface

Methods

__init__(self, session_id: str, data_section_id: str, title: str)

Purpose: Initialize a new SmartStatSession with required identifiers and empty data structures

Parameters:

  • session_id: Unique identifier for this session
  • data_section_id: Identifier linking to parent data section
  • title: Human-readable session title

Returns: None (constructor)

to_dict(self) -> dict

Purpose: Serialize the session to a dictionary containing metadata, statistics, and data information suitable for JSON serialization

Returns: Dictionary with keys: session_id, data_section_id, title, created_at, updated_at, has_data, analysis_count, plots_count, dataset_count, info_sheet_count, and optionally rows, columns, column_names, numeric_columns (for primary dataframe), datasets (array of dataset info), and info_sheets (array with names and truncated content)

Attributes

Name Type Description Scope
session_id str Unique identifier for this analysis session instance
data_section_id str Identifier linking this session to a parent data section instance
title str Human-readable title/name for the session instance
created_at datetime Timestamp when the session was created, set automatically on initialization instance
updated_at datetime Timestamp of last session update, should be manually updated when session state changes instance
dataframe pandas.DataFrame | None Primary/main dataframe for backward compatibility with single-dataset workflows, initially None instance
datasets dict[str, pandas.DataFrame] Dictionary mapping dataset names to pandas DataFrames for multi-dataset workflows, initially empty instance
info_sheets dict[str, str] Dictionary mapping information sheet names to their text content/context, initially empty instance
analysis_history list List storing history of analyses performed in this session, append analysis metadata as dictionaries instance
generated_plots list List storing metadata about plots generated during analysis (paths, types, etc.), initially empty instance
final_report Any | None Storage for the final analysis report, format depends on implementation, initially None instance
excel_sheets list | None List of sheet names if data was uploaded from Excel file, None if not from Excel instance
active_sheet str | None Name of the currently selected/active Excel sheet, None if not applicable instance

Dependencies

  • datetime
  • pandas

Required Imports

from datetime import datetime
import pandas as pd

Usage Example

from datetime import datetime
import pandas as pd

# Create a new session
session = SmartStatSession(
    session_id='sess_12345',
    data_section_id='data_section_001',
    title='Sales Analysis Q4 2023'
)

# Add primary dataframe (backward compatible)
df = pd.DataFrame({'sales': [100, 200, 300], 'region': ['North', 'South', 'East']})
session.dataframe = df

# Add multiple named datasets
session.datasets['sales_data'] = df
session.datasets['customer_data'] = pd.DataFrame({'customer_id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})

# Add information sheets
session.info_sheets['methodology'] = 'This analysis uses standard statistical methods...'

# Track analysis history
session.analysis_history.append({'type': 'descriptive', 'timestamp': datetime.now()})

# Add generated plots
session.generated_plots.append({'plot_type': 'histogram', 'path': '/plots/hist_001.png'})

# Set Excel sheet information
session.excel_sheets = ['Sheet1', 'Sheet2', 'Summary']
session.active_sheet = 'Sheet1'

# Update timestamp
session.updated_at = datetime.now()

# Serialize to dictionary for API response
session_dict = session.to_dict()
print(f"Session has {session_dict['dataset_count']} datasets and {session_dict['analysis_count']} analyses")

Best Practices

  • Always update the updated_at timestamp when modifying session state to track changes accurately
  • Use the datasets dictionary for multi-dataset workflows; the dataframe attribute is maintained for backward compatibility
  • When adding datasets, ensure they are pandas DataFrame objects to maintain consistency
  • Limit info_sheets content size as to_dict() truncates to 2000 characters for display purposes
  • Track analysis operations by appending to analysis_history with relevant metadata (type, timestamp, parameters)
  • Store plot metadata (not actual plot objects) in generated_plots with references to file paths or identifiers
  • Set active_sheet when working with Excel files to indicate which sheet is currently being analyzed
  • Use to_dict() for serialization to JSON-compatible format for API responses or persistence
  • The session maintains both single dataframe (dataframe attribute) and multi-dataset (datasets dict) for flexibility
  • Check has_data in to_dict() result to determine if session contains any data before performing operations

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class StatisticalSession 80.7% similar

    A dataclass representing a statistical analysis session that tracks metadata, configuration, and status of data analysis operations.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class StatisticalSession_v1 77.7% similar

    A dataclass representing a statistical analysis session that tracks user data analysis workflows, including data sources, configurations, and execution status.

    From: /tf/active/vicechatdev/smartstat/models.py
  • class DataAnalysisSession_v1 75.8% similar

    A dataclass representing a statistical analysis session that is linked to specific document sections, managing analysis state, messages, plots, and configuration.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataAnalysisSession 72.7% similar

    A dataclass representing a data analysis session that is linked to a specific text section within a document, managing conversation messages, analysis results, plots, and configuration.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • function smartstat_get_history 70.4% similar

    Flask API endpoint that retrieves analysis history for a SmartStat session, with automatic session recovery from saved data if the session is not found in memory.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
← Back to Browse