class SmartStatSession
A session management class that encapsulates a SmartStat statistical analysis session, tracking data, analysis history, plots, and reports for a specific data section.
/tf/active/vicechatdev/vice_ai/smartstat_service.py
997 - 1061
moderate
Purpose
SmartStatSession manages the complete lifecycle of a statistical analysis session. It stores and organizes multiple datasets, information sheets, analysis history, generated plots, and final reports. The class supports both single dataframe (backward compatible) and multi-dataset workflows, tracks Excel sheet information, and provides serialization capabilities for session persistence and API responses.
Source Code
class SmartStatSession:
"""Represents a SmartStat analysis session linked to a data section"""
def __init__(self, session_id: str, data_section_id: str, title: str):
self.session_id = session_id
self.data_section_id = data_section_id
self.title = title
self.created_at = datetime.now()
self.updated_at = datetime.now()
self.dataframe = None # Primary/main dataframe for backward compatibility
self.datasets = {} # Dictionary of named datasets {name: dataframe}
self.info_sheets = {} # Dictionary of information sheets {name: context_text}
self.analysis_history = []
self.generated_plots = []
self.final_report = None
self.excel_sheets = None # Store info about Excel sheets if uploaded
self.active_sheet = None # Currently selected sheet name
def to_dict(self):
result = {
'session_id': self.session_id,
'data_section_id': self.data_section_id,
'title': self.title,
'created_at': self.created_at.isoformat(),
'updated_at': self.updated_at.isoformat(),
'has_data': self.dataframe is not None or len(self.datasets) > 0,
'analysis_count': len(self.analysis_history),
'plots_count': len(self.generated_plots),
'dataset_count': len(self.datasets),
'info_sheet_count': len(self.info_sheets)
}
# Add primary dataframe details if available
if self.dataframe is not None:
result['rows'] = len(self.dataframe)
result['columns'] = len(self.dataframe.columns)
result['column_names'] = list(self.dataframe.columns)
numeric_cols = self.dataframe.select_dtypes(include=['number']).columns.tolist()
result['numeric_columns'] = numeric_cols
# Add multi-dataset information
if len(self.datasets) > 0:
# Return as array for frontend compatibility
result['datasets'] = []
for name, df in self.datasets.items():
result['datasets'].append({
'name': name,
'rows': len(df),
'columns': len(df.columns),
'column_names': list(df.columns),
'numeric_columns': df.select_dtypes(include=['number']).columns.tolist()
})
# Add information sheets context
if len(self.info_sheets) > 0:
# Return as array with names AND content for frontend display
result['info_sheets'] = [
{
'name': name,
'content': content[:2000] if len(content) > 2000 else content # Limit to 2000 chars for display
}
for name, content in self.info_sheets.items()
]
return result
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
session_id: Unique identifier string for the session, typically a UUID or similar unique string used to reference this specific analysis session
data_section_id: Identifier string linking this session to a specific data section in the parent system, establishing the relationship between session and data source
title: Human-readable string title/name for the session, used for display and identification purposes in user interfaces
Return Value
Instantiation returns a SmartStatSession object with initialized attributes. The to_dict() method returns a dictionary containing session metadata including session_id, data_section_id, title, timestamps, data statistics (rows, columns, column names), dataset information, and analysis counts suitable for JSON serialization and API responses.
Class Interface
Methods
__init__(self, session_id: str, data_section_id: str, title: str)
Purpose: Initialize a new SmartStatSession with required identifiers and empty data structures
Parameters:
session_id: Unique identifier for this sessiondata_section_id: Identifier linking to parent data sectiontitle: Human-readable session title
Returns: None (constructor)
to_dict(self) -> dict
Purpose: Serialize the session to a dictionary containing metadata, statistics, and data information suitable for JSON serialization
Returns: Dictionary with keys: session_id, data_section_id, title, created_at, updated_at, has_data, analysis_count, plots_count, dataset_count, info_sheet_count, and optionally rows, columns, column_names, numeric_columns (for primary dataframe), datasets (array of dataset info), and info_sheets (array with names and truncated content)
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
session_id |
str | Unique identifier for this analysis session | instance |
data_section_id |
str | Identifier linking this session to a parent data section | instance |
title |
str | Human-readable title/name for the session | instance |
created_at |
datetime | Timestamp when the session was created, set automatically on initialization | instance |
updated_at |
datetime | Timestamp of last session update, should be manually updated when session state changes | instance |
dataframe |
pandas.DataFrame | None | Primary/main dataframe for backward compatibility with single-dataset workflows, initially None | instance |
datasets |
dict[str, pandas.DataFrame] | Dictionary mapping dataset names to pandas DataFrames for multi-dataset workflows, initially empty | instance |
info_sheets |
dict[str, str] | Dictionary mapping information sheet names to their text content/context, initially empty | instance |
analysis_history |
list | List storing history of analyses performed in this session, append analysis metadata as dictionaries | instance |
generated_plots |
list | List storing metadata about plots generated during analysis (paths, types, etc.), initially empty | instance |
final_report |
Any | None | Storage for the final analysis report, format depends on implementation, initially None | instance |
excel_sheets |
list | None | List of sheet names if data was uploaded from Excel file, None if not from Excel | instance |
active_sheet |
str | None | Name of the currently selected/active Excel sheet, None if not applicable | instance |
Dependencies
datetimepandas
Required Imports
from datetime import datetime
import pandas as pd
Usage Example
from datetime import datetime
import pandas as pd
# Create a new session
session = SmartStatSession(
session_id='sess_12345',
data_section_id='data_section_001',
title='Sales Analysis Q4 2023'
)
# Add primary dataframe (backward compatible)
df = pd.DataFrame({'sales': [100, 200, 300], 'region': ['North', 'South', 'East']})
session.dataframe = df
# Add multiple named datasets
session.datasets['sales_data'] = df
session.datasets['customer_data'] = pd.DataFrame({'customer_id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
# Add information sheets
session.info_sheets['methodology'] = 'This analysis uses standard statistical methods...'
# Track analysis history
session.analysis_history.append({'type': 'descriptive', 'timestamp': datetime.now()})
# Add generated plots
session.generated_plots.append({'plot_type': 'histogram', 'path': '/plots/hist_001.png'})
# Set Excel sheet information
session.excel_sheets = ['Sheet1', 'Sheet2', 'Summary']
session.active_sheet = 'Sheet1'
# Update timestamp
session.updated_at = datetime.now()
# Serialize to dictionary for API response
session_dict = session.to_dict()
print(f"Session has {session_dict['dataset_count']} datasets and {session_dict['analysis_count']} analyses")
Best Practices
- Always update the updated_at timestamp when modifying session state to track changes accurately
- Use the datasets dictionary for multi-dataset workflows; the dataframe attribute is maintained for backward compatibility
- When adding datasets, ensure they are pandas DataFrame objects to maintain consistency
- Limit info_sheets content size as to_dict() truncates to 2000 characters for display purposes
- Track analysis operations by appending to analysis_history with relevant metadata (type, timestamp, parameters)
- Store plot metadata (not actual plot objects) in generated_plots with references to file paths or identifiers
- Set active_sheet when working with Excel files to indicate which sheet is currently being analyzed
- Use to_dict() for serialization to JSON-compatible format for API responses or persistence
- The session maintains both single dataframe (dataframe attribute) and multi-dataset (datasets dict) for flexibility
- Check has_data in to_dict() result to determine if session contains any data before performing operations
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class StatisticalSession 80.7% similar
-
class StatisticalSession_v1 77.7% similar
-
class DataAnalysisSession_v1 75.8% similar
-
class DataAnalysisSession 72.7% similar
-
function smartstat_get_history 70.4% similar