class DataSource_v2
A dataclass that encapsulates configuration for various data sources including files, SQL databases, and SQL workflow metadata.
/tf/active/vicechatdev/vice_ai/smartstat_models.py
46 - 76
simple
Purpose
DataSource serves as a configuration container for different types of data sources in a data analysis system. It supports file-based sources, SQL database connections, and includes fields for SQL workflow automation where LLMs can generate queries from user requests. The class handles serialization/deserialization to/from dictionaries and manages default values for optional parameters.
Source Code
class DataSource:
"""Data source configuration"""
source_type: DataSourceType
file_path: Optional[str] = None
sql_connection: Optional[str] = None
sql_query: Optional[str] = None
table_name: Optional[str] = None
# New fields for SQL workflow
user_query: Optional[str] = None # Original user analysis request
schema_file: Optional[str] = None # Path to database schema JSON
connection_config: Optional[str] = None # Path to connection config
generated_sql: Optional[str] = None # LLM-generated SQL query
query_explanation: Optional[str] = None # Explanation of generated query
parameters: Dict[str, Any] = None
def __post_init__(self):
if self.parameters is None:
self.parameters = {}
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary with enum values as strings"""
result = asdict(self)
result['source_type'] = self.source_type.value
return result
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> 'DataSource':
"""Create instance from dictionary"""
if isinstance(data.get('source_type'), str):
data['source_type'] = DataSourceType(data['source_type'])
return cls(**data)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
source_type: A DataSourceType enum value indicating the type of data source (e.g., FILE, SQL, DATABASE). This is the only required parameter.
file_path: Optional string path to a file-based data source. Used when source_type indicates a file source.
sql_connection: Optional string containing SQL database connection information (connection string or identifier).
sql_query: Optional string containing a SQL query to execute against the database.
table_name: Optional string specifying the database table name to query or interact with.
user_query: Optional string containing the original user's natural language analysis request, used in SQL workflow automation.
schema_file: Optional string path to a JSON file containing the database schema definition for SQL generation.
connection_config: Optional string path to a configuration file containing database connection settings.
generated_sql: Optional string storing the SQL query generated by an LLM from the user_query.
query_explanation: Optional string containing a human-readable explanation of the generated SQL query.
parameters: Dictionary of additional key-value parameters for the data source. Defaults to empty dict if not provided.
Return Value
Instantiation returns a DataSource object with all specified attributes. The to_dict() method returns a dictionary representation with the source_type enum converted to its string value. The from_dict() class method returns a new DataSource instance created from a dictionary.
Class Interface
Methods
__post_init__(self) -> None
Purpose: Dataclass post-initialization hook that ensures the parameters attribute is initialized to an empty dictionary if None
Returns: None - modifies instance state in-place
to_dict(self) -> Dict[str, Any]
Purpose: Converts the DataSource instance to a dictionary representation with enum values converted to strings for serialization
Returns: Dictionary containing all instance attributes with source_type converted from enum to string value
from_dict(cls, data: Dict[str, Any]) -> 'DataSource'
Purpose: Class method that creates a DataSource instance from a dictionary, handling conversion of string source_type back to enum
Parameters:
data: Dictionary containing DataSource attributes, with source_type as either string or DataSourceType enum
Returns: New DataSource instance created from the dictionary data
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
source_type |
DataSourceType | Enum indicating the type of data source (required field) | instance |
file_path |
Optional[str] | Path to file-based data source | instance |
sql_connection |
Optional[str] | SQL database connection string or identifier | instance |
sql_query |
Optional[str] | SQL query to execute | instance |
table_name |
Optional[str] | Database table name | instance |
user_query |
Optional[str] | Original user's natural language analysis request for SQL workflow | instance |
schema_file |
Optional[str] | Path to JSON file containing database schema for SQL generation | instance |
connection_config |
Optional[str] | Path to database connection configuration file | instance |
generated_sql |
Optional[str] | SQL query generated by LLM from user_query | instance |
query_explanation |
Optional[str] | Human-readable explanation of the generated SQL query | instance |
parameters |
Dict[str, Any] | Additional configuration parameters as key-value pairs, initialized to empty dict if None | instance |
Dependencies
dataclassestypingenum
Required Imports
from dataclasses import dataclass, asdict
from typing import Dict, Any, Optional
from enum import Enum
Usage Example
from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum
class DataSourceType(Enum):
FILE = 'file'
SQL = 'sql'
DATABASE = 'database'
# Create a file-based data source
file_source = DataSource(
source_type=DataSourceType.FILE,
file_path='/path/to/data.csv',
parameters={'delimiter': ',', 'encoding': 'utf-8'}
)
# Create a SQL data source
sql_source = DataSource(
source_type=DataSourceType.SQL,
sql_connection='postgresql://user:pass@localhost/db',
sql_query='SELECT * FROM users',
table_name='users'
)
# Create a SQL workflow source with LLM integration
workflow_source = DataSource(
source_type=DataSourceType.SQL,
user_query='Show me all users who signed up last month',
schema_file='/path/to/schema.json',
connection_config='/path/to/db_config.json'
)
# Serialize to dictionary
data_dict = file_source.to_dict()
# Deserialize from dictionary
restored_source = DataSource.from_dict(data_dict)
Best Practices
- Always specify the source_type parameter as it is the only required field
- Use appropriate optional fields based on the source_type (e.g., file_path for file sources, sql_connection for SQL sources)
- The parameters dictionary is automatically initialized to an empty dict in __post_init__ if not provided
- When serializing/deserializing, use to_dict() and from_dict() methods to ensure proper enum handling
- For SQL workflow automation, populate user_query, schema_file, and connection_config, then store generated_sql and query_explanation after LLM processing
- The class is immutable after creation unless you modify attributes directly (dataclass default behavior)
- Ensure DataSourceType enum is properly defined with string values for serialization to work correctly
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class DataSource_v1 92.6% similar
-
class DataSource 92.1% similar
-
class DataSourceType_v1 72.8% similar
-
class DataSourceType_v2 70.2% similar
-
class DataSourceType 68.9% similar