class DataSource_v1
A dataclass that encapsulates configuration for various data sources used in analysis, supporting file-based, SQL database, and query-based data sources.
/tf/active/vicechatdev/vice_ai/models.py
1907 - 1936
moderate
Purpose
DataSource serves as a configuration container for different types of data sources in an analysis system. It supports multiple source types (files, SQL databases, user queries) and stores related metadata including connection details, queries, schema information, and LLM-generated SQL. The class provides serialization/deserialization methods for persistence and includes fields for tracking the analysis workflow from user query to generated SQL with explanations.
Source Code
class DataSource:
"""Data source configuration for analysis"""
source_type: DataSourceType
file_path: Optional[str] = None
sql_connection: Optional[str] = None
sql_query: Optional[str] = None
table_name: Optional[str] = None
user_query: Optional[str] = None # Original user analysis request
schema_file: Optional[str] = None # Path to database schema JSON
connection_config: Optional[str] = None # Path to connection config
generated_sql: Optional[str] = None # LLM-generated SQL query
query_explanation: Optional[str] = None # Explanation of generated query
parameters: Dict[str, Any] = None
def __post_init__(self):
if self.parameters is None:
self.parameters = {}
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary with enum values as strings"""
result = asdict(self)
result['source_type'] = self.source_type.value
return result
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> 'DataSource':
"""Create instance from dictionary"""
if isinstance(data.get('source_type'), str):
data['source_type'] = DataSourceType(data['source_type'])
return cls(**data)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
source_type: A DataSourceType enum value indicating the type of data source (e.g., file, SQL database, user query). This is a required field that determines which other fields are relevant.
file_path: Optional string path to a file-based data source. Used when source_type indicates a file-based source.
sql_connection: Optional string containing SQL database connection information (e.g., connection string or database URL).
sql_query: Optional string containing a SQL query to execute against the database.
table_name: Optional string specifying the name of a database table to query or analyze.
user_query: Optional string containing the original natural language analysis request from the user, used for LLM-based query generation.
schema_file: Optional string path to a JSON file containing the database schema definition.
connection_config: Optional string path to a configuration file containing database connection parameters.
generated_sql: Optional string containing SQL query generated by an LLM based on the user_query.
query_explanation: Optional string containing a human-readable explanation of the generated SQL query.
parameters: Dictionary of additional key-value parameters for flexible configuration. Defaults to empty dict if not provided.
Return Value
Instantiation returns a DataSource object with all specified attributes. The to_dict() method returns a dictionary representation with the source_type enum converted to its string value. The from_dict() class method returns a new DataSource instance reconstructed from a dictionary.
Class Interface
Methods
__post_init__(self) -> None
Purpose: Dataclass post-initialization hook that ensures the parameters attribute is initialized to an empty dictionary if None
Returns: None - modifies instance state in-place
to_dict(self) -> Dict[str, Any]
Purpose: Converts the DataSource instance to a dictionary representation with enum values converted to strings for serialization
Returns: Dictionary containing all instance attributes with source_type enum converted to its string value
from_dict(cls, data: Dict[str, Any]) -> 'DataSource'
Purpose: Class method that creates a DataSource instance from a dictionary, handling string-to-enum conversion for source_type
Parameters:
data: Dictionary containing DataSource attributes, with source_type as either a string or DataSourceType enum
Returns: New DataSource instance constructed from the dictionary data
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
source_type |
DataSourceType | Enum indicating the type of data source (required field) | instance |
file_path |
Optional[str] | Path to a file-based data source | instance |
sql_connection |
Optional[str] | SQL database connection string or identifier | instance |
sql_query |
Optional[str] | SQL query to execute against the database | instance |
table_name |
Optional[str] | Name of the database table to query | instance |
user_query |
Optional[str] | Original natural language analysis request from the user | instance |
schema_file |
Optional[str] | Path to JSON file containing database schema definition | instance |
connection_config |
Optional[str] | Path to configuration file with connection parameters | instance |
generated_sql |
Optional[str] | SQL query generated by LLM from user_query | instance |
query_explanation |
Optional[str] | Human-readable explanation of the generated SQL query | instance |
parameters |
Dict[str, Any] | Additional configuration parameters as key-value pairs, initialized to empty dict if None | instance |
Dependencies
dataclassestypingenum
Required Imports
from dataclasses import dataclass, asdict
from typing import Dict, Any, Optional
from enum import Enum
Conditional/Optional Imports
These imports are only needed under specific conditions:
DataSourceType enum definition
Condition: Required for the source_type field - must be defined in the same module or imported
Required (conditional)Usage Example
# Assuming DataSourceType enum is defined
from enum import Enum
class DataSourceType(Enum):
FILE = 'file'
SQL = 'sql'
USER_QUERY = 'user_query'
# Create a file-based data source
file_source = DataSource(
source_type=DataSourceType.FILE,
file_path='/path/to/data.csv',
parameters={'delimiter': ',', 'encoding': 'utf-8'}
)
# Create a SQL-based data source
sql_source = DataSource(
source_type=DataSourceType.SQL,
sql_connection='postgresql://user:pass@localhost/db',
table_name='sales_data',
sql_query='SELECT * FROM sales_data WHERE year = 2023'
)
# Create a user query-based source with LLM generation
query_source = DataSource(
source_type=DataSourceType.USER_QUERY,
user_query='Show me total sales by region',
schema_file='/path/to/schema.json',
generated_sql='SELECT region, SUM(amount) FROM sales GROUP BY region',
query_explanation='This query aggregates sales amounts by region'
)
# Serialize to dictionary
data_dict = file_source.to_dict()
# Deserialize from dictionary
restored_source = DataSource.from_dict(data_dict)
Best Practices
- Always specify the source_type parameter as it determines which other fields are relevant
- Initialize parameters dict explicitly if you need to add custom configuration, though __post_init__ handles None case
- Use to_dict() for serialization before storing to JSON or databases to ensure enum values are properly converted to strings
- Use from_dict() class method for deserialization to ensure proper enum reconstruction
- For SQL sources, provide either sql_query or table_name, or both depending on your use case
- When using LLM-generated queries, populate user_query, generated_sql, and query_explanation for full traceability
- Store schema_file and connection_config as separate files for better security and maintainability
- The class is immutable after creation (dataclass without frozen=True), but attributes can be modified if needed
- Validate that required fields for your specific source_type are populated after instantiation
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class DataSource_v2 92.6% similar
-
class DataSource 90.3% similar
-
class DataSourceType_v1 71.2% similar
-
class DataSourceType_v2 69.8% similar
-
class DataSourceType 66.6% similar