🔍 Code Extractor

class DataSource_v1

Maturity: 48

A dataclass that encapsulates configuration for various data sources used in analysis, supporting file-based, SQL database, and query-based data sources.

File:
/tf/active/vicechatdev/vice_ai/models.py
Lines:
1907 - 1936
Complexity:
moderate

Purpose

DataSource serves as a configuration container for different types of data sources in an analysis system. It supports multiple source types (files, SQL databases, user queries) and stores related metadata including connection details, queries, schema information, and LLM-generated SQL. The class provides serialization/deserialization methods for persistence and includes fields for tracking the analysis workflow from user query to generated SQL with explanations.

Source Code

class DataSource:
    """Data source configuration for analysis"""
    source_type: DataSourceType
    file_path: Optional[str] = None
    sql_connection: Optional[str] = None
    sql_query: Optional[str] = None
    table_name: Optional[str] = None
    user_query: Optional[str] = None  # Original user analysis request
    schema_file: Optional[str] = None  # Path to database schema JSON
    connection_config: Optional[str] = None  # Path to connection config
    generated_sql: Optional[str] = None  # LLM-generated SQL query
    query_explanation: Optional[str] = None  # Explanation of generated query
    parameters: Dict[str, Any] = None
    
    def __post_init__(self):
        if self.parameters is None:
            self.parameters = {}
    
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary with enum values as strings"""
        result = asdict(self)
        result['source_type'] = self.source_type.value
        return result
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> 'DataSource':
        """Create instance from dictionary"""
        if isinstance(data.get('source_type'), str):
            data['source_type'] = DataSourceType(data['source_type'])
        return cls(**data)

Parameters

Name Type Default Kind
bases - -

Parameter Details

source_type: A DataSourceType enum value indicating the type of data source (e.g., file, SQL database, user query). This is a required field that determines which other fields are relevant.

file_path: Optional string path to a file-based data source. Used when source_type indicates a file-based source.

sql_connection: Optional string containing SQL database connection information (e.g., connection string or database URL).

sql_query: Optional string containing a SQL query to execute against the database.

table_name: Optional string specifying the name of a database table to query or analyze.

user_query: Optional string containing the original natural language analysis request from the user, used for LLM-based query generation.

schema_file: Optional string path to a JSON file containing the database schema definition.

connection_config: Optional string path to a configuration file containing database connection parameters.

generated_sql: Optional string containing SQL query generated by an LLM based on the user_query.

query_explanation: Optional string containing a human-readable explanation of the generated SQL query.

parameters: Dictionary of additional key-value parameters for flexible configuration. Defaults to empty dict if not provided.

Return Value

Instantiation returns a DataSource object with all specified attributes. The to_dict() method returns a dictionary representation with the source_type enum converted to its string value. The from_dict() class method returns a new DataSource instance reconstructed from a dictionary.

Class Interface

Methods

__post_init__(self) -> None

Purpose: Dataclass post-initialization hook that ensures the parameters attribute is initialized to an empty dictionary if None

Returns: None - modifies instance state in-place

to_dict(self) -> Dict[str, Any]

Purpose: Converts the DataSource instance to a dictionary representation with enum values converted to strings for serialization

Returns: Dictionary containing all instance attributes with source_type enum converted to its string value

from_dict(cls, data: Dict[str, Any]) -> 'DataSource'

Purpose: Class method that creates a DataSource instance from a dictionary, handling string-to-enum conversion for source_type

Parameters:

  • data: Dictionary containing DataSource attributes, with source_type as either a string or DataSourceType enum

Returns: New DataSource instance constructed from the dictionary data

Attributes

Name Type Description Scope
source_type DataSourceType Enum indicating the type of data source (required field) instance
file_path Optional[str] Path to a file-based data source instance
sql_connection Optional[str] SQL database connection string or identifier instance
sql_query Optional[str] SQL query to execute against the database instance
table_name Optional[str] Name of the database table to query instance
user_query Optional[str] Original natural language analysis request from the user instance
schema_file Optional[str] Path to JSON file containing database schema definition instance
connection_config Optional[str] Path to configuration file with connection parameters instance
generated_sql Optional[str] SQL query generated by LLM from user_query instance
query_explanation Optional[str] Human-readable explanation of the generated SQL query instance
parameters Dict[str, Any] Additional configuration parameters as key-value pairs, initialized to empty dict if None instance

Dependencies

  • dataclasses
  • typing
  • enum

Required Imports

from dataclasses import dataclass, asdict
from typing import Dict, Any, Optional
from enum import Enum

Conditional/Optional Imports

These imports are only needed under specific conditions:

DataSourceType enum definition

Condition: Required for the source_type field - must be defined in the same module or imported

Required (conditional)

Usage Example

# Assuming DataSourceType enum is defined
from enum import Enum

class DataSourceType(Enum):
    FILE = 'file'
    SQL = 'sql'
    USER_QUERY = 'user_query'

# Create a file-based data source
file_source = DataSource(
    source_type=DataSourceType.FILE,
    file_path='/path/to/data.csv',
    parameters={'delimiter': ',', 'encoding': 'utf-8'}
)

# Create a SQL-based data source
sql_source = DataSource(
    source_type=DataSourceType.SQL,
    sql_connection='postgresql://user:pass@localhost/db',
    table_name='sales_data',
    sql_query='SELECT * FROM sales_data WHERE year = 2023'
)

# Create a user query-based source with LLM generation
query_source = DataSource(
    source_type=DataSourceType.USER_QUERY,
    user_query='Show me total sales by region',
    schema_file='/path/to/schema.json',
    generated_sql='SELECT region, SUM(amount) FROM sales GROUP BY region',
    query_explanation='This query aggregates sales amounts by region'
)

# Serialize to dictionary
data_dict = file_source.to_dict()

# Deserialize from dictionary
restored_source = DataSource.from_dict(data_dict)

Best Practices

  • Always specify the source_type parameter as it determines which other fields are relevant
  • Initialize parameters dict explicitly if you need to add custom configuration, though __post_init__ handles None case
  • Use to_dict() for serialization before storing to JSON or databases to ensure enum values are properly converted to strings
  • Use from_dict() class method for deserialization to ensure proper enum reconstruction
  • For SQL sources, provide either sql_query or table_name, or both depending on your use case
  • When using LLM-generated queries, populate user_query, generated_sql, and query_explanation for full traceability
  • Store schema_file and connection_config as separate files for better security and maintainability
  • The class is immutable after creation (dataclass without frozen=True), but attributes can be modified if needed
  • Validate that required fields for your specific source_type are populated after instantiation

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class DataSource_v2 92.6% similar

    A dataclass that encapsulates configuration for various data sources including files, SQL databases, and SQL workflow metadata.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class DataSource 90.3% similar

    A dataclass that represents configuration for various data sources, supporting file-based, SQL database, and query-based data access patterns.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType_v1 71.2% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class DataSourceType_v2 69.8% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSourceType 66.6% similar

    An enumeration class that defines the different types of data sources available in the system.

    From: /tf/active/vicechatdev/vice_ai/models.py
← Back to Browse