SimpleDataHandle - Code Extractor

class SimpleDataHandle

Maturity: 38

A data handler class that manages multiple data sources with different types (dataframes, vector stores, databases) and their associated processing configurations.

File:
/tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

Lines:
718 - 787

Complexity:
moderate

Purpose

SimpleDataHandle provides a centralized registry for managing heterogeneous data sources in a data processing or RAG (Retrieval-Augmented Generation) pipeline. It stores data along with metadata including type, filters, processing steps, inclusion limits, and instructions for how to use each data source. The class automatically configures default settings based on data type and can convert documents to vector stores using FAISS and OpenAI embeddings.

Source Code

class SimpleDataHandle:
     
    def __init__(self):
        self.handlers = {}
        return
     
    def add_data(self, name:str, type:str, data:Any, filters:str="", processing_steps:List[str]=[], inclusions:int=10,instructions:str=""):
        ## Default values for type, filters, processing_steps, instructions
        if type == "":
            type = "text"
        if type=="dataframe":
            filters=""
            if processing_steps==[]:
                processing_steps=["markdown"]
            if instructions=="":
                instructions="""Start with a summary of the internal data, using summary tables when possible. If the internal data is presented as chemical formulas in SMILES format, try to find the corresponding chemical names and properties and report those in your answer.
                            Use them to compare it to other chemical data in the external sources."""
        if type=="vectorstore" or "to_vectorstore":
            if processing_steps==[]:
                processing_steps=["similarity"]
            if instructions=="":
                instructions="""Provide a summary of the given context data extracted from lab data and reports and from scientific literature, using summary tables when possible.
                            """
        if type =="to_vectorstore":
            embeddings = OpenAIEmbeddings()
            index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))
            vector_store = FAISS(
                embedding_function=embeddings,
                docstore=InMemoryDocstore(),
                index_to_docstore_id={},
                index=index
            )
            uuids = [str(uuid4()) for _ in range(len(data))]
            vector_store.add_documents(
            documents=data,  
            ids=uuids,
            )
            data=vector_store
            type="vectorstore"
        if type == "db_search":
            if processing_steps==[]:
                processing_steps=["similarity"]
            if instructions=="":
                instructions="""Provide a summary of the given context data extracted from lab data and reports and from scientific literature, using summary tables when possible.
                            """
        if type=="chromaDB":
            if processing_steps==[]:
                processing_steps=["similarity"]
            if instructions=="":    
                instructions="""Provide a summary of the given context data extracted from lab data and reports and from scientific literature, using summary tables when possible.
                            """
        
        self.handlers[name] = {
            "type" : type,
            "data" : data,
            "filters" : filters,
            "processing_steps" : processing_steps,
            "inclusions" : inclusions,
            "instructions" : instructions
        }
        return
     
    def remove_data(self, name:str):
        if name in self.handlers:
            del self.handlers[name]
        return
    
    def clear_data(self):
        self.handlers = {}
        return

Parameters

Name	Type	Default	Kind
`bases`	-	-

Parameter Details

__init__: No parameters required. Initializes an empty handlers dictionary to store data sources.

Return Value

The class constructor returns None. The add_data, remove_data, and clear_data methods all return None (implicit). The class maintains state through the handlers dictionary which stores data source configurations as nested dictionaries with keys: type, data, filters, processing_steps, inclusions, and instructions.

Class Interface

Methods

`init(self) -> None`

Purpose: Initialize a new SimpleDataHandle instance with an empty handlers dictionary

Returns: None

`add_data(self, name: str, type: str, data: Any, filters: str = '', processing_steps: List[str] = [], inclusions: int = 10, instructions: str = '') -> None`

Purpose: Add a data source to the handler with associated configuration. Automatically sets defaults based on type and converts 'to_vectorstore' type to FAISS vector stores.

Parameters:

name: Unique identifier for this data source, used as dictionary key
type: Data type: 'text', 'dataframe', 'vectorstore', 'to_vectorstore', 'db_search', or 'chromaDB'
data: The actual data object (DataFrame, list of Documents, vector store, etc.)
filters: Filter criteria for the data (empty string by default, forced empty for dataframes)
processing_steps: List of processing steps to apply (e.g., ['markdown'], ['similarity']). Defaults set by type.
inclusions: Number of items to include in processing (default 10)
instructions: Instructions for how to use this data source in downstream processing. Defaults set by type.

Returns: None (modifies self.handlers dictionary in place)

`remove_data(self, name: str) -> None`

Purpose: Remove a data source from the handler by name

Parameters:

name: The name/key of the data source to remove

Returns: None (modifies self.handlers dictionary in place, silently does nothing if name not found)

`clear_data(self) -> None`

Purpose: Remove all data sources from the handler, resetting to empty state

Returns: None (resets self.handlers to empty dictionary)

Attributes

Name	Type	Description	Scope
`handlers`	Dict[str, Dict[str, Any]]	Dictionary mapping data source names to their configuration dictionaries. Each configuration contains keys: 'type', 'data', 'filters', 'processing_steps', 'inclusions', 'instructions'	instance

Dependencies

typing
panel
langchain_community
langchain_openai
uuid
pandas
sentence_transformers
faiss
numpy
neo4j
openai
chromadb
tiktoken
pybtex

Required Imports

from typing import List, Any, Dict
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from uuid import uuid4
import faiss

Conditional/Optional Imports

These imports are only needed under specific conditions:

from langchain_community.embeddings import OpenAIEmbeddings

Condition: only when adding data with type='to_vectorstore'

Required (conditional)

from langchain_community.vectorstores import FAISS

Condition: only when adding data with type='to_vectorstore'

Required (conditional)

import faiss

Condition: only when adding data with type='to_vectorstore'

Required (conditional)

Usage Example

# Initialize the data handler
handler = SimpleDataHandle()

# Add a dataframe
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
handler.add_data(
    name='my_dataframe',
    type='dataframe',
    data=df,
    inclusions=5
)

# Add documents to be converted to vector store
from langchain_core.documents import Document
docs = [Document(page_content='text1'), Document(page_content='text2')]
handler.add_data(
    name='my_vectors',
    type='to_vectorstore',
    data=docs
)

# Access stored data
df_config = handler.handlers['my_dataframe']
print(df_config['type'])  # 'dataframe'
print(df_config['processing_steps'])  # ['markdown']

# Remove a data source
handler.remove_data('my_dataframe')

# Clear all data
handler.clear_data()

Best Practices

Always initialize the class before adding data sources
Use descriptive unique names for each data source as they serve as dictionary keys
The 'to_vectorstore' type requires documents in LangChain Document format and will automatically convert them to FAISS vector stores
Default processing_steps and instructions are automatically set based on data type, but can be overridden
The handlers dictionary is the primary state - access it directly to retrieve stored configurations
When using 'to_vectorstore', ensure OpenAI API credentials are configured before calling add_data
The inclusions parameter (default 10) likely controls how many items to include in processing
Remove unused data sources with remove_data() to free memory, especially for large vector stores
Use clear_data() to reset the entire handler state

Similar Components

AI-powered semantic similarity - components with related functionality:

class DataSource 60.9% similar

A dataclass that represents configuration for various data sources, supporting file-based, SQL database, and query-based data access patterns.
From: /tf/active/vicechatdev/vice_ai/models.py
class DataSource_v2 58.4% similar

A dataclass that encapsulates configuration for various data sources including files, SQL databases, and SQL workflow metadata.
From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
class DataSource_v1 57.9% similar

A dataclass that encapsulates configuration for various data sources used in analysis, supporting file-based, SQL database, and query-based data sources.
From: /tf/active/vicechatdev/vice_ai/models.py
class DataProcessor 54.7% similar

Handles data loading, validation, and preprocessing
From: /tf/active/vicechatdev/full_smartstat/data_processor.py
class DataProcessor_v1 54.5% similar

Handles data loading, validation, and preprocessing
From: /tf/active/vicechatdev/smartstat/data_processor.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            class SimpleDataHandle:
     
    def __init__(self):
        self.handlers = {}
        return
     
    def add_data(self, name:str, type:str, data:Any, filters:str="", processing_steps:List[str]=[], inclusions:int=10,instructions:str=""):
        ## Default values for type, filters, processing_steps, instructions
        if type == "":
            type = "text"
        if type=="dataframe":
            filters=""
            if processing_steps==[]:
                processing_steps=["markdown"]
            if instructions=="":
                instructions="""Start with a summary of the internal data, using summary tables when possible. If the internal data is presented as chemical formulas in SMILES format, try to find the corresponding chemical names and properties and report those in your answer.
                            Use them to compare it to other chemical data in the external sources."""
        if type=="vectorstore" or "to_vectorstore":
            if processing_steps==[]:
                processing_steps=["similarity"]
            if instructions=="":
                instructions="""Provide a summary of the given context data extracted from lab data and reports and from scientific literature, using summary tables when possible.
                            """
        if type =="to_vectorstore":
            embeddings = OpenAIEmbeddings()
            index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))
            vector_store = FAISS(
                embedding_function=embeddings,
                docstore=InMemoryDocstore(),
                index_to_docstore_id={},
                index=index
            )
            uuids = [str(uuid4()) for _ in range(len(data))]
            vector_store.add_documents(
            documents=data,  
            ids=uuids,
            )
            data=vector_store
            type="vectorstore"
        if type == "db_search":
            if processing_steps==[]:
                processing_steps=["similarity"]
            if instructions=="":
                instructions="""Provide a summary of the given context data extracted from lab data and reports and from scientific literature, using summary tables when possible.
                            """
        if type=="chromaDB":
            if processing_steps==[]:
                processing_steps=["similarity"]
            if instructions=="":    
                instructions="""Provide a summary of the given context data extracted from lab data and reports and from scientific literature, using summary tables when possible.
                            """
        
        self.handlers[name] = {
            "type" : type,
            "data" : data,
            "filters" : filters,
            "processing_steps" : processing_steps,
            "inclusions" : inclusions,
            "instructions" : instructions
        }
        return
     
    def remove_data(self, name:str):
        if name in self.handlers:
            del self.handlers[name]
        return
    
    def clear_data(self):
        self.handlers = {}
        return      
                        

Improved Code

🔍 Code Extractor

class SimpleDataHandle

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(self) -> None`

`add_data(self, name: str, type: str, data: Any, filters: str = '', processing_steps: List[str] = [], inclusions: int = 10, instructions: str = '') -> None`

`remove_data(self, name: str) -> None`

`clear_data(self) -> None`

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

class DataSource 60.9% similar

class DataSource_v2 58.4% similar

class DataSource_v1 57.9% similar

class DataProcessor 54.7% similar

class DataProcessor_v1 54.5% similar

class SimpleDataHandle

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(self) -> None

add_data(self, name: str, type: str, data: Any, filters: str = '', processing_steps: List[str] = [], inclusions: int = 10, instructions: str = '') -> None

remove_data(self, name: str) -> None

clear_data(self) -> None

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

class DataSource 60.9% similar

class DataSource_v2 58.4% similar

class DataSource_v1 57.9% similar

class DataProcessor 54.7% similar

class DataProcessor_v1 54.5% similar

✨ Improve Code: SimpleDataHandle

Code Comparison

`init(self) -> None`

`add_data(self, name: str, type: str, data: Any, filters: str = '', processing_steps: List[str] = [], inclusions: int = 10, instructions: str = '') -> None`

`remove_data(self, name: str) -> None`

`clear_data(self) -> None`