🔍 Code Extractor

class DataSource

Maturity: 48

A dataclass that represents configuration for various data sources, supporting file-based, SQL database, and query-based data access patterns.

File:
/tf/active/vicechatdev/vice_ai/models.py
Lines:
1717 - 1746
Complexity:
simple

Purpose

DataSource serves as a configuration container for different types of data sources in a data processing or analytics system. It supports multiple source types (files, SQL databases, user queries) and stores all necessary connection details, queries, and metadata. The class provides serialization/deserialization capabilities for persistence and transmission, making it suitable for configuration management, data pipeline definitions, and query execution workflows.

Source Code

class DataSource:
    """Data source configuration"""
    source_type: DataSourceType
    file_path: Optional[str] = None
    sql_connection: Optional[str] = None
    sql_query: Optional[str] = None
    table_name: Optional[str] = None
    user_query: Optional[str] = None
    schema_file: Optional[str] = None
    connection_config: Optional[str] = None
    generated_sql: Optional[str] = None
    query_explanation: Optional[str] = None
    parameters: Dict[str, Any] = None
    
    def __post_init__(self):
        if self.parameters is None:
            self.parameters = {}
    
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary with enum values as strings"""
        result = asdict(self)
        result['source_type'] = self.source_type.value
        return result
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> 'DataSource':
        """Create instance from dictionary"""
        if isinstance(data.get('source_type'), str):
            data['source_type'] = DataSourceType(data['source_type'])
        return cls(**data)

Parameters

Name Type Default Kind
bases - -

Parameter Details

source_type: A DataSourceType enum value that specifies the type of data source (e.g., FILE, SQL, USER_QUERY). This is a required field that determines which other fields are relevant.

file_path: Optional string path to a file-based data source. Used when source_type indicates a file-based source.

sql_connection: Optional string containing SQL database connection information (connection string or identifier). Used for SQL-based data sources.

sql_query: Optional string containing the SQL query to execute against the database. Used when retrieving data via SQL.

table_name: Optional string specifying the database table name. Used for direct table access without custom queries.

user_query: Optional string containing a natural language or user-provided query. May be used for query generation or search functionality.

schema_file: Optional string path to a schema definition file that describes the structure of the data source.

connection_config: Optional string containing additional connection configuration details, possibly in JSON or other serialized format.

generated_sql: Optional string storing SQL that was automatically generated, typically from a user_query. Used for tracking query generation results.

query_explanation: Optional string providing human-readable explanation of the query or data retrieval logic.

parameters: Dictionary of additional key-value parameters for flexible configuration. Defaults to empty dict if not provided. Can store source-specific settings.

Return Value

Instantiation returns a DataSource object with all specified configuration. The to_dict() method returns a dictionary representation with the source_type enum converted to its string value. The from_dict() class method returns a new DataSource instance reconstructed from a dictionary.

Class Interface

Methods

__post_init__(self) -> None

Purpose: Dataclass post-initialization hook that ensures the parameters dictionary is initialized to an empty dict if None

Returns: None - modifies instance state in place

to_dict(self) -> Dict[str, Any]

Purpose: Converts the DataSource instance to a dictionary representation with enum values converted to strings for serialization

Returns: Dictionary containing all instance attributes with source_type converted from enum to its string value

from_dict(cls, data: Dict[str, Any]) -> 'DataSource'

Purpose: Class method that creates a DataSource instance from a dictionary, handling conversion of string source_type back to enum

Parameters:

  • data: Dictionary containing DataSource configuration with keys matching class attributes. The 'source_type' can be either a string or DataSourceType enum

Returns: New DataSource instance constructed from the provided dictionary data

Attributes

Name Type Description Scope
source_type DataSourceType Enum value indicating the type of data source (required field) instance
file_path Optional[str] Path to file-based data source instance
sql_connection Optional[str] SQL database connection string or identifier instance
sql_query Optional[str] SQL query to execute for data retrieval instance
table_name Optional[str] Database table name for direct table access instance
user_query Optional[str] Natural language or user-provided query string instance
schema_file Optional[str] Path to schema definition file describing data structure instance
connection_config Optional[str] Additional connection configuration details instance
generated_sql Optional[str] Automatically generated SQL query, typically from user_query instance
query_explanation Optional[str] Human-readable explanation of the query or data retrieval logic instance
parameters Dict[str, Any] Dictionary of additional configuration parameters, initialized to empty dict if None instance

Dependencies

  • dataclasses
  • typing
  • enum

Required Imports

from dataclasses import dataclass, asdict
from typing import Dict, Any, Optional
from enum import Enum

Usage Example

from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum

class DataSourceType(Enum):
    FILE = 'file'
    SQL = 'sql'
    USER_QUERY = 'user_query'

# Create a file-based data source
file_source = DataSource(
    source_type=DataSourceType.FILE,
    file_path='/data/sales.csv',
    schema_file='/schemas/sales_schema.json',
    parameters={'delimiter': ',', 'encoding': 'utf-8'}
)

# Create a SQL-based data source
sql_source = DataSource(
    source_type=DataSourceType.SQL,
    sql_connection='postgresql://localhost:5432/mydb',
    table_name='customers',
    sql_query='SELECT * FROM customers WHERE active = true'
)

# Serialize to dictionary
config_dict = file_source.to_dict()
print(config_dict['source_type'])  # 'file'

# Deserialize from dictionary
restored_source = DataSource.from_dict(config_dict)

# Access attributes
if restored_source.file_path:
    print(f'Loading data from {restored_source.file_path}')

# Use parameters
delimiter = restored_source.parameters.get('delimiter', ',')

Best Practices

  • Always specify the source_type parameter as it determines which other fields are relevant
  • Use the parameters dictionary for source-specific configuration that doesn't fit standard fields
  • The __post_init__ method ensures parameters is never None, so it's safe to access without checking
  • When serializing/deserializing, use to_dict() and from_dict() to properly handle enum conversion
  • Only populate fields relevant to your source_type to keep the configuration clean
  • Store generated_sql and query_explanation for debugging and audit trails when using query generation
  • The class is immutable by default (dataclass), but fields can be modified after instantiation if needed
  • Use schema_file to maintain separation between data and schema definitions
  • Connection strings in sql_connection should follow standard database URL formats

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class DataSource_v2 92.1% similar

    A dataclass that encapsulates configuration for various data sources including files, SQL databases, and SQL workflow metadata.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class DataSource_v1 90.3% similar

    A dataclass that encapsulates configuration for various data sources used in analysis, supporting file-based, SQL database, and query-based data sources.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType 72.9% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType_v2 72.6% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType_v1 72.2% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
← Back to Browse