🔍 Code Extractor

class DataSource_v2

Maturity: 44

A dataclass that encapsulates configuration for various data sources including files, SQL databases, and SQL workflow metadata.

File:
/tf/active/vicechatdev/vice_ai/smartstat_models.py
Lines:
46 - 76
Complexity:
simple

Purpose

DataSource serves as a configuration container for different types of data sources in a data analysis system. It supports file-based sources, SQL database connections, and includes fields for SQL workflow automation where LLMs can generate queries from user requests. The class handles serialization/deserialization to/from dictionaries and manages default values for optional parameters.

Source Code

class DataSource:
    """Data source configuration"""
    source_type: DataSourceType
    file_path: Optional[str] = None
    sql_connection: Optional[str] = None
    sql_query: Optional[str] = None
    table_name: Optional[str] = None
    # New fields for SQL workflow
    user_query: Optional[str] = None  # Original user analysis request
    schema_file: Optional[str] = None  # Path to database schema JSON
    connection_config: Optional[str] = None  # Path to connection config
    generated_sql: Optional[str] = None  # LLM-generated SQL query
    query_explanation: Optional[str] = None  # Explanation of generated query
    parameters: Dict[str, Any] = None
    
    def __post_init__(self):
        if self.parameters is None:
            self.parameters = {}
    
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary with enum values as strings"""
        result = asdict(self)
        result['source_type'] = self.source_type.value
        return result
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> 'DataSource':
        """Create instance from dictionary"""
        if isinstance(data.get('source_type'), str):
            data['source_type'] = DataSourceType(data['source_type'])
        return cls(**data)

Parameters

Name Type Default Kind
bases - -

Parameter Details

source_type: A DataSourceType enum value indicating the type of data source (e.g., FILE, SQL, DATABASE). This is the only required parameter.

file_path: Optional string path to a file-based data source. Used when source_type indicates a file source.

sql_connection: Optional string containing SQL database connection information (connection string or identifier).

sql_query: Optional string containing a SQL query to execute against the database.

table_name: Optional string specifying the database table name to query or interact with.

user_query: Optional string containing the original user's natural language analysis request, used in SQL workflow automation.

schema_file: Optional string path to a JSON file containing the database schema definition for SQL generation.

connection_config: Optional string path to a configuration file containing database connection settings.

generated_sql: Optional string storing the SQL query generated by an LLM from the user_query.

query_explanation: Optional string containing a human-readable explanation of the generated SQL query.

parameters: Dictionary of additional key-value parameters for the data source. Defaults to empty dict if not provided.

Return Value

Instantiation returns a DataSource object with all specified attributes. The to_dict() method returns a dictionary representation with the source_type enum converted to its string value. The from_dict() class method returns a new DataSource instance created from a dictionary.

Class Interface

Methods

__post_init__(self) -> None

Purpose: Dataclass post-initialization hook that ensures the parameters attribute is initialized to an empty dictionary if None

Returns: None - modifies instance state in-place

to_dict(self) -> Dict[str, Any]

Purpose: Converts the DataSource instance to a dictionary representation with enum values converted to strings for serialization

Returns: Dictionary containing all instance attributes with source_type converted from enum to string value

from_dict(cls, data: Dict[str, Any]) -> 'DataSource'

Purpose: Class method that creates a DataSource instance from a dictionary, handling conversion of string source_type back to enum

Parameters:

  • data: Dictionary containing DataSource attributes, with source_type as either string or DataSourceType enum

Returns: New DataSource instance created from the dictionary data

Attributes

Name Type Description Scope
source_type DataSourceType Enum indicating the type of data source (required field) instance
file_path Optional[str] Path to file-based data source instance
sql_connection Optional[str] SQL database connection string or identifier instance
sql_query Optional[str] SQL query to execute instance
table_name Optional[str] Database table name instance
user_query Optional[str] Original user's natural language analysis request for SQL workflow instance
schema_file Optional[str] Path to JSON file containing database schema for SQL generation instance
connection_config Optional[str] Path to database connection configuration file instance
generated_sql Optional[str] SQL query generated by LLM from user_query instance
query_explanation Optional[str] Human-readable explanation of the generated SQL query instance
parameters Dict[str, Any] Additional configuration parameters as key-value pairs, initialized to empty dict if None instance

Dependencies

  • dataclasses
  • typing
  • enum

Required Imports

from dataclasses import dataclass, asdict
from typing import Dict, Any, Optional
from enum import Enum

Usage Example

from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum

class DataSourceType(Enum):
    FILE = 'file'
    SQL = 'sql'
    DATABASE = 'database'

# Create a file-based data source
file_source = DataSource(
    source_type=DataSourceType.FILE,
    file_path='/path/to/data.csv',
    parameters={'delimiter': ',', 'encoding': 'utf-8'}
)

# Create a SQL data source
sql_source = DataSource(
    source_type=DataSourceType.SQL,
    sql_connection='postgresql://user:pass@localhost/db',
    sql_query='SELECT * FROM users',
    table_name='users'
)

# Create a SQL workflow source with LLM integration
workflow_source = DataSource(
    source_type=DataSourceType.SQL,
    user_query='Show me all users who signed up last month',
    schema_file='/path/to/schema.json',
    connection_config='/path/to/db_config.json'
)

# Serialize to dictionary
data_dict = file_source.to_dict()

# Deserialize from dictionary
restored_source = DataSource.from_dict(data_dict)

Best Practices

  • Always specify the source_type parameter as it is the only required field
  • Use appropriate optional fields based on the source_type (e.g., file_path for file sources, sql_connection for SQL sources)
  • The parameters dictionary is automatically initialized to an empty dict in __post_init__ if not provided
  • When serializing/deserializing, use to_dict() and from_dict() methods to ensure proper enum handling
  • For SQL workflow automation, populate user_query, schema_file, and connection_config, then store generated_sql and query_explanation after LLM processing
  • The class is immutable after creation unless you modify attributes directly (dataclass default behavior)
  • Ensure DataSourceType enum is properly defined with string values for serialization to work correctly

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class DataSource_v1 92.6% similar

    A dataclass that encapsulates configuration for various data sources used in analysis, supporting file-based, SQL database, and query-based data sources.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSource 92.1% similar

    A dataclass that represents configuration for various data sources, supporting file-based, SQL database, and query-based data access patterns.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType_v1 72.8% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class DataSourceType_v2 70.2% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType 68.9% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/models.py
← Back to Browse