SmartStatSession - Code Extractor

class SmartStatSession

Maturity: 51

A session management class that encapsulates a SmartStat statistical analysis session, tracking data, analysis history, plots, and reports for a specific data section.

File:
/tf/active/vicechatdev/vice_ai/smartstat_service.py

Lines:
997 - 1061

Complexity:
moderate

Purpose

SmartStatSession manages the complete lifecycle of a statistical analysis session. It stores and organizes multiple datasets, information sheets, analysis history, generated plots, and final reports. The class supports both single dataframe (backward compatible) and multi-dataset workflows, tracks Excel sheet information, and provides serialization capabilities for session persistence and API responses.

Source Code

class SmartStatSession:
    """Represents a SmartStat analysis session linked to a data section"""
    
    def __init__(self, session_id: str, data_section_id: str, title: str):
        self.session_id = session_id
        self.data_section_id = data_section_id
        self.title = title
        self.created_at = datetime.now()
        self.updated_at = datetime.now()
        self.dataframe = None  # Primary/main dataframe for backward compatibility
        self.datasets = {}  # Dictionary of named datasets {name: dataframe}
        self.info_sheets = {}  # Dictionary of information sheets {name: context_text}
        self.analysis_history = []
        self.generated_plots = []
        self.final_report = None
        self.excel_sheets = None  # Store info about Excel sheets if uploaded
        self.active_sheet = None  # Currently selected sheet name
        
    def to_dict(self):
        result = {
            'session_id': self.session_id,
            'data_section_id': self.data_section_id,
            'title': self.title,
            'created_at': self.created_at.isoformat(),
            'updated_at': self.updated_at.isoformat(),
            'has_data': self.dataframe is not None or len(self.datasets) > 0,
            'analysis_count': len(self.analysis_history),
            'plots_count': len(self.generated_plots),
            'dataset_count': len(self.datasets),
            'info_sheet_count': len(self.info_sheets)
        }
        
        # Add primary dataframe details if available
        if self.dataframe is not None:
            result['rows'] = len(self.dataframe)
            result['columns'] = len(self.dataframe.columns)
            result['column_names'] = list(self.dataframe.columns)
            numeric_cols = self.dataframe.select_dtypes(include=['number']).columns.tolist()
            result['numeric_columns'] = numeric_cols
        
        # Add multi-dataset information
        if len(self.datasets) > 0:
            # Return as array for frontend compatibility
            result['datasets'] = []
            for name, df in self.datasets.items():
                result['datasets'].append({
                    'name': name,
                    'rows': len(df),
                    'columns': len(df.columns),
                    'column_names': list(df.columns),
                    'numeric_columns': df.select_dtypes(include=['number']).columns.tolist()
                })
        
        # Add information sheets context  
        if len(self.info_sheets) > 0:
            # Return as array with names AND content for frontend display
            result['info_sheets'] = [
                {
                    'name': name, 
                    'content': content[:2000] if len(content) > 2000 else content  # Limit to 2000 chars for display
                } 
                for name, content in self.info_sheets.items()
            ]
        
        return result

Parameters

Name	Type	Default	Kind
`bases`	-	-

Parameter Details

session_id: Unique identifier string for the session, typically a UUID or similar unique string used to reference this specific analysis session

data_section_id: Identifier string linking this session to a specific data section in the parent system, establishing the relationship between session and data source

title: Human-readable string title/name for the session, used for display and identification purposes in user interfaces

Return Value

Instantiation returns a SmartStatSession object with initialized attributes. The to_dict() method returns a dictionary containing session metadata including session_id, data_section_id, title, timestamps, data statistics (rows, columns, column names), dataset information, and analysis counts suitable for JSON serialization and API responses.

Class Interface

Methods

`init(self, session_id: str, data_section_id: str, title: str)`

Purpose: Initialize a new SmartStatSession with required identifiers and empty data structures

Parameters:

session_id: Unique identifier for this session
data_section_id: Identifier linking to parent data section
title: Human-readable session title

Returns: None (constructor)

`to_dict(self) -> dict`

Purpose: Serialize the session to a dictionary containing metadata, statistics, and data information suitable for JSON serialization

Returns: Dictionary with keys: session_id, data_section_id, title, created_at, updated_at, has_data, analysis_count, plots_count, dataset_count, info_sheet_count, and optionally rows, columns, column_names, numeric_columns (for primary dataframe), datasets (array of dataset info), and info_sheets (array with names and truncated content)

Attributes

Name	Type	Description	Scope
`session_id`	str	Unique identifier for this analysis session	instance
`data_section_id`	str	Identifier linking this session to a parent data section	instance
`title`	str	Human-readable title/name for the session	instance
`created_at`	datetime	Timestamp when the session was created, set automatically on initialization	instance
`updated_at`	datetime	Timestamp of last session update, should be manually updated when session state changes	instance
`dataframe`	pandas.DataFrame \| None	Primary/main dataframe for backward compatibility with single-dataset workflows, initially None	instance
`datasets`	dict[str, pandas.DataFrame]	Dictionary mapping dataset names to pandas DataFrames for multi-dataset workflows, initially empty	instance
`info_sheets`	dict[str, str]	Dictionary mapping information sheet names to their text content/context, initially empty	instance
`analysis_history`	list	List storing history of analyses performed in this session, append analysis metadata as dictionaries	instance
`generated_plots`	list	List storing metadata about plots generated during analysis (paths, types, etc.), initially empty	instance
`final_report`	Any \| None	Storage for the final analysis report, format depends on implementation, initially None	instance
`excel_sheets`	list \| None	List of sheet names if data was uploaded from Excel file, None if not from Excel	instance
`active_sheet`	str \| None	Name of the currently selected/active Excel sheet, None if not applicable	instance

Dependencies

datetime
pandas

Required Imports

from datetime import datetime
import pandas as pd

Usage Example

from datetime import datetime
import pandas as pd

# Create a new session
session = SmartStatSession(
    session_id='sess_12345',
    data_section_id='data_section_001',
    title='Sales Analysis Q4 2023'
)

# Add primary dataframe (backward compatible)
df = pd.DataFrame({'sales': [100, 200, 300], 'region': ['North', 'South', 'East']})
session.dataframe = df

# Add multiple named datasets
session.datasets['sales_data'] = df
session.datasets['customer_data'] = pd.DataFrame({'customer_id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})

# Add information sheets
session.info_sheets['methodology'] = 'This analysis uses standard statistical methods...'

# Track analysis history
session.analysis_history.append({'type': 'descriptive', 'timestamp': datetime.now()})

# Add generated plots
session.generated_plots.append({'plot_type': 'histogram', 'path': '/plots/hist_001.png'})

# Set Excel sheet information
session.excel_sheets = ['Sheet1', 'Sheet2', 'Summary']
session.active_sheet = 'Sheet1'

# Update timestamp
session.updated_at = datetime.now()

# Serialize to dictionary for API response
session_dict = session.to_dict()
print(f"Session has {session_dict['dataset_count']} datasets and {session_dict['analysis_count']} analyses")

Best Practices

Always update the updated_at timestamp when modifying session state to track changes accurately
Use the datasets dictionary for multi-dataset workflows; the dataframe attribute is maintained for backward compatibility
When adding datasets, ensure they are pandas DataFrame objects to maintain consistency
Limit info_sheets content size as to_dict() truncates to 2000 characters for display purposes
Track analysis operations by appending to analysis_history with relevant metadata (type, timestamp, parameters)
Store plot metadata (not actual plot objects) in generated_plots with references to file paths or identifiers
Set active_sheet when working with Excel files to indicate which sheet is currently being analyzed
Use to_dict() for serialization to JSON-compatible format for API responses or persistence
The session maintains both single dataframe (dataframe attribute) and multi-dataset (datasets dict) for flexibility
Check has_data in to_dict() result to determine if session contains any data before performing operations

Similar Components

AI-powered semantic similarity - components with related functionality:

class StatisticalSession 80.7% similar

A dataclass representing a statistical analysis session that tracks metadata, configuration, and status of data analysis operations.
From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
class StatisticalSession_v1 77.7% similar

A dataclass representing a statistical analysis session that tracks user data analysis workflows, including data sources, configurations, and execution status.
From: /tf/active/vicechatdev/smartstat/models.py
class DataAnalysisSession_v1 75.8% similar

A dataclass representing a statistical analysis session that is linked to specific document sections, managing analysis state, messages, plots, and configuration.
From: /tf/active/vicechatdev/vice_ai/models.py
class DataAnalysisSession 72.7% similar

A dataclass representing a data analysis session that is linked to a specific text section within a document, managing conversation messages, analysis results, plots, and configuration.
From: /tf/active/vicechatdev/vice_ai/models.py
function smartstat_get_history 70.4% similar

Flask API endpoint that retrieves analysis history for a SmartStat session, with automatic session recovery from saved data if the session is not found in memory.
From: /tf/active/vicechatdev/vice_ai/new_app.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            class SmartStatSession:
    """Represents a SmartStat analysis session linked to a data section"""
    
    def __init__(self, session_id: str, data_section_id: str, title: str):
        self.session_id = session_id
        self.data_section_id = data_section_id
        self.title = title
        self.created_at = datetime.now()
        self.updated_at = datetime.now()
        self.dataframe = None  # Primary/main dataframe for backward compatibility
        self.datasets = {}  # Dictionary of named datasets {name: dataframe}
        self.info_sheets = {}  # Dictionary of information sheets {name: context_text}
        self.analysis_history = []
        self.generated_plots = []
        self.final_report = None
        self.excel_sheets = None  # Store info about Excel sheets if uploaded
        self.active_sheet = None  # Currently selected sheet name
        
    def to_dict(self):
        result = {
            'session_id': self.session_id,
            'data_section_id': self.data_section_id,
            'title': self.title,
            'created_at': self.created_at.isoformat(),
            'updated_at': self.updated_at.isoformat(),
            'has_data': self.dataframe is not None or len(self.datasets) > 0,
            'analysis_count': len(self.analysis_history),
            'plots_count': len(self.generated_plots),
            'dataset_count': len(self.datasets),
            'info_sheet_count': len(self.info_sheets)
        }
        
        # Add primary dataframe details if available
        if self.dataframe is not None:
            result['rows'] = len(self.dataframe)
            result['columns'] = len(self.dataframe.columns)
            result['column_names'] = list(self.dataframe.columns)
            numeric_cols = self.dataframe.select_dtypes(include=['number']).columns.tolist()
            result['numeric_columns'] = numeric_cols
        
        # Add multi-dataset information
        if len(self.datasets) > 0:
            # Return as array for frontend compatibility
            result['datasets'] = []
            for name, df in self.datasets.items():
                result['datasets'].append({
                    'name': name,
                    'rows': len(df),
                    'columns': len(df.columns),
                    'column_names': list(df.columns),
                    'numeric_columns': df.select_dtypes(include=['number']).columns.tolist()
                })
        
        # Add information sheets context  
        if len(self.info_sheets) > 0:
            # Return as array with names AND content for frontend display
            result['info_sheets'] = [
                {
                    'name': name, 
                    'content': content[:2000] if len(content) > 2000 else content  # Limit to 2000 chars for display
                } 
                for name, content in self.info_sheets.items()
            ]
        
        return result
                        

Improved Code

🔍 Code Extractor

class SmartStatSession

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(self, session_id: str, data_section_id: str, title: str)`

`to_dict(self) -> dict`

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class StatisticalSession 80.7% similar

class StatisticalSession_v1 77.7% similar

class DataAnalysisSession_v1 75.8% similar

class DataAnalysisSession 72.7% similar

function smartstat_get_history 70.4% similar

class SmartStatSession

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(self, session_id: str, data_section_id: str, title: str)

to_dict(self) -> dict

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class StatisticalSession 80.7% similar

class StatisticalSession_v1 77.7% similar

class DataAnalysisSession_v1 75.8% similar

class DataAnalysisSession 72.7% similar

function smartstat_get_history 70.4% similar

✨ Improve Code: SmartStatSession

Code Comparison

`init(self, session_id: str, data_section_id: str, title: str)`

`to_dict(self) -> dict`