class DataSource
A dataclass that represents configuration for various data sources, supporting file-based, SQL database, and query-based data access patterns.
/tf/active/vicechatdev/vice_ai/models.py
1717 - 1746
simple
Purpose
DataSource serves as a configuration container for different types of data sources in a data processing or analytics system. It supports multiple source types (files, SQL databases, user queries) and stores all necessary connection details, queries, and metadata. The class provides serialization/deserialization capabilities for persistence and transmission, making it suitable for configuration management, data pipeline definitions, and query execution workflows.
Source Code
class DataSource:
"""Data source configuration"""
source_type: DataSourceType
file_path: Optional[str] = None
sql_connection: Optional[str] = None
sql_query: Optional[str] = None
table_name: Optional[str] = None
user_query: Optional[str] = None
schema_file: Optional[str] = None
connection_config: Optional[str] = None
generated_sql: Optional[str] = None
query_explanation: Optional[str] = None
parameters: Dict[str, Any] = None
def __post_init__(self):
if self.parameters is None:
self.parameters = {}
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary with enum values as strings"""
result = asdict(self)
result['source_type'] = self.source_type.value
return result
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> 'DataSource':
"""Create instance from dictionary"""
if isinstance(data.get('source_type'), str):
data['source_type'] = DataSourceType(data['source_type'])
return cls(**data)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
source_type: A DataSourceType enum value that specifies the type of data source (e.g., FILE, SQL, USER_QUERY). This is a required field that determines which other fields are relevant.
file_path: Optional string path to a file-based data source. Used when source_type indicates a file-based source.
sql_connection: Optional string containing SQL database connection information (connection string or identifier). Used for SQL-based data sources.
sql_query: Optional string containing the SQL query to execute against the database. Used when retrieving data via SQL.
table_name: Optional string specifying the database table name. Used for direct table access without custom queries.
user_query: Optional string containing a natural language or user-provided query. May be used for query generation or search functionality.
schema_file: Optional string path to a schema definition file that describes the structure of the data source.
connection_config: Optional string containing additional connection configuration details, possibly in JSON or other serialized format.
generated_sql: Optional string storing SQL that was automatically generated, typically from a user_query. Used for tracking query generation results.
query_explanation: Optional string providing human-readable explanation of the query or data retrieval logic.
parameters: Dictionary of additional key-value parameters for flexible configuration. Defaults to empty dict if not provided. Can store source-specific settings.
Return Value
Instantiation returns a DataSource object with all specified configuration. The to_dict() method returns a dictionary representation with the source_type enum converted to its string value. The from_dict() class method returns a new DataSource instance reconstructed from a dictionary.
Class Interface
Methods
__post_init__(self) -> None
Purpose: Dataclass post-initialization hook that ensures the parameters dictionary is initialized to an empty dict if None
Returns: None - modifies instance state in place
to_dict(self) -> Dict[str, Any]
Purpose: Converts the DataSource instance to a dictionary representation with enum values converted to strings for serialization
Returns: Dictionary containing all instance attributes with source_type converted from enum to its string value
from_dict(cls, data: Dict[str, Any]) -> 'DataSource'
Purpose: Class method that creates a DataSource instance from a dictionary, handling conversion of string source_type back to enum
Parameters:
data: Dictionary containing DataSource configuration with keys matching class attributes. The 'source_type' can be either a string or DataSourceType enum
Returns: New DataSource instance constructed from the provided dictionary data
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
source_type |
DataSourceType | Enum value indicating the type of data source (required field) | instance |
file_path |
Optional[str] | Path to file-based data source | instance |
sql_connection |
Optional[str] | SQL database connection string or identifier | instance |
sql_query |
Optional[str] | SQL query to execute for data retrieval | instance |
table_name |
Optional[str] | Database table name for direct table access | instance |
user_query |
Optional[str] | Natural language or user-provided query string | instance |
schema_file |
Optional[str] | Path to schema definition file describing data structure | instance |
connection_config |
Optional[str] | Additional connection configuration details | instance |
generated_sql |
Optional[str] | Automatically generated SQL query, typically from user_query | instance |
query_explanation |
Optional[str] | Human-readable explanation of the query or data retrieval logic | instance |
parameters |
Dict[str, Any] | Dictionary of additional configuration parameters, initialized to empty dict if None | instance |
Dependencies
dataclassestypingenum
Required Imports
from dataclasses import dataclass, asdict
from typing import Dict, Any, Optional
from enum import Enum
Usage Example
from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum
class DataSourceType(Enum):
FILE = 'file'
SQL = 'sql'
USER_QUERY = 'user_query'
# Create a file-based data source
file_source = DataSource(
source_type=DataSourceType.FILE,
file_path='/data/sales.csv',
schema_file='/schemas/sales_schema.json',
parameters={'delimiter': ',', 'encoding': 'utf-8'}
)
# Create a SQL-based data source
sql_source = DataSource(
source_type=DataSourceType.SQL,
sql_connection='postgresql://localhost:5432/mydb',
table_name='customers',
sql_query='SELECT * FROM customers WHERE active = true'
)
# Serialize to dictionary
config_dict = file_source.to_dict()
print(config_dict['source_type']) # 'file'
# Deserialize from dictionary
restored_source = DataSource.from_dict(config_dict)
# Access attributes
if restored_source.file_path:
print(f'Loading data from {restored_source.file_path}')
# Use parameters
delimiter = restored_source.parameters.get('delimiter', ',')
Best Practices
- Always specify the source_type parameter as it determines which other fields are relevant
- Use the parameters dictionary for source-specific configuration that doesn't fit standard fields
- The __post_init__ method ensures parameters is never None, so it's safe to access without checking
- When serializing/deserializing, use to_dict() and from_dict() to properly handle enum conversion
- Only populate fields relevant to your source_type to keep the configuration clean
- Store generated_sql and query_explanation for debugging and audit trails when using query generation
- The class is immutable by default (dataclass), but fields can be modified after instantiation if needed
- Use schema_file to maintain separation between data and schema definitions
- Connection strings in sql_connection should follow standard database URL formats
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class DataSource_v2 92.1% similar
-
class DataSource_v1 90.3% similar
-
class DataSourceType 72.9% similar
-
class DataSourceType_v2 72.6% similar
-
class DataSourceType_v1 72.2% similar