MyEmbeddingFunction - Code Extractor

class MyEmbeddingFunction

Maturity: 51

Custom embedding function class that integrates OpenAI's embedding API with Chroma DB for generating vector embeddings from text documents.

File:
/tf/active/vicechatdev/project_victoria_disclosure_generator.py

Lines:
819 - 856

Complexity:
moderate

Purpose

This class serves as an adapter between Chroma DB's EmbeddingFunction interface and OpenAI's embedding API. It enables Chroma DB to use OpenAI's embedding models (like text-embedding-3-small) for converting text documents into vector representations. The class handles API authentication, embedding generation, and error fallback scenarios. It's designed to be used as a custom embedding function when initializing Chroma collections.

Source Code

class MyEmbeddingFunction(EmbeddingFunction):
    """
    Custom embedding function for Chroma DB using OpenAI embeddings.
    """
    
    def __init__(self, model_name: str, embedding_model: str, api_key: str):
        self.model_name = model_name
        self.embedding_model = embedding_model
        self.api_key = api_key
        
        # Set up OpenAI client
        os.environ["OPENAI_API_KEY"] = api_key
        from openai import OpenAI
        self.client = OpenAI(api_key=api_key)
    
    def __call__(self, input: Documents) -> Embeddings:
        """
        Generate embeddings for input documents.
        
        Args:
            input: List of document texts
            
        Returns:
            List of embedding vectors
        """
        try:
            response = self.client.embeddings.create(
                input=input,
                model=self.embedding_model
            )
            
            embeddings = [data.embedding for data in response.data]
            return embeddings
            
        except Exception as e:
            print(f"Error generating embeddings: {e}")
            # Return zero embeddings as fallback
            return [[0.0] * 1536 for _ in input]  # 1536 is dimension for text-embedding-3-small

Parameters

Name	Type	Default	Kind
`bases`	EmbeddingFunction	-

Parameter Details

model_name: Name identifier for the model being used. This parameter is stored but not actively used in the current implementation - it appears to be for tracking or logging purposes.

embedding_model: The specific OpenAI embedding model to use (e.g., 'text-embedding-3-small', 'text-embedding-ada-002'). This determines the embedding dimensions and quality.

api_key: OpenAI API key for authentication. This key is used to initialize the OpenAI client and is also set as an environment variable.

Return Value

Instantiation returns a MyEmbeddingFunction object that can be called as a function. When called (via __call__), it returns a list of embedding vectors (Embeddings type), where each embedding is a list of floats representing the vector for the corresponding input document. On error, returns zero-filled vectors with dimension 1536.

Class Interface

Methods

`init(self, model_name: str, embedding_model: str, api_key: str)`

Purpose: Initializes the embedding function with OpenAI credentials and model configuration

Parameters:

model_name: Name identifier for the model (stored but not actively used)
embedding_model: OpenAI embedding model name (e.g., 'text-embedding-3-small')
api_key: OpenAI API key for authentication

Returns: None (constructor)

`call(self, input: Documents) -> Embeddings`

Purpose: Generates embedding vectors for the provided input documents using OpenAI's API

Parameters:

input: List of document texts (strings) to generate embeddings for

Returns: List of embedding vectors, where each vector is a list of floats. Returns zero-filled vectors (dimension 1536) if an error occurs.

Attributes

Name	Type	Description	Scope
`model_name`	str	Stores the model name identifier passed during initialization	instance
`embedding_model`	str	The OpenAI embedding model name used for generating embeddings	instance
`api_key`	str	The OpenAI API key used for authentication	instance
`client`	OpenAI	OpenAI client instance used to make API calls for embedding generation	instance

Dependencies

os
openai
chromadb

Required Imports

import os
from chromadb import Documents
from chromadb import EmbeddingFunction
from chromadb import Embeddings

Conditional/Optional Imports

These imports are only needed under specific conditions:

from openai import OpenAI

Condition: imported inside __init__ method when the class is instantiated

Required (conditional)

Usage Example

# Initialize the embedding function
api_key = 'your-openai-api-key'
embedding_fn = MyEmbeddingFunction(
    model_name='my-model',
    embedding_model='text-embedding-3-small',
    api_key=api_key
)

# Use with Chroma DB
import chromadb
client = chromadb.Client()
collection = client.create_collection(
    name='my_collection',
    embedding_function=embedding_fn
)

# Or call directly to generate embeddings
documents = ['Hello world', 'Another document']
embeddings = embedding_fn(documents)
print(f'Generated {len(embeddings)} embeddings')
print(f'Embedding dimension: {len(embeddings[0])}')

Best Practices

Always provide a valid OpenAI API key to avoid authentication errors
The class modifies the global environment variable OPENAI_API_KEY, which may affect other parts of your application
Error handling returns zero-filled embeddings (1536 dimensions) as fallback - ensure your application can handle these gracefully
The hardcoded dimension of 1536 in the error handler is specific to text-embedding-3-small; if using different models, this may need adjustment
The model_name parameter is stored but unused - consider removing it or implementing logging/tracking functionality
This class is designed to be instantiated once and reused for multiple embedding operations
The __call__ method makes instances callable, allowing them to be used directly as functions
Consider implementing retry logic for transient API failures instead of immediately falling back to zero embeddings
The class creates a new OpenAI client on each instantiation - avoid creating multiple instances unnecessarily

Similar Components

AI-powered semantic similarity - components with related functionality:

class MyEmbeddingFunction_v1 87.0% similar

A custom embedding function class that generates embeddings for documents using OpenAI's API, with built-in text summarization for long documents and token management.
From: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py
class DocChatEmbeddingFunction 82.8% similar

A custom ChromaDB embedding function that generates OpenAI embeddings with automatic text summarization for documents exceeding token limits.
From: /tf/active/vicechatdev/docchat/document_indexer.py
class MyEmbeddingFunction_v2 80.5% similar

A custom embedding function class that generates embeddings for text documents using OpenAI's embedding models, with automatic text summarization and token management for large documents.
From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
class MyEmbeddingFunction_v3 78.5% similar

A custom embedding function class that generates embeddings for text documents using OpenAI's embedding models, with automatic text summarization and token limit handling for large documents.
From: /tf/active/vicechatdev/offline_docstore_multi.py
class DocumentIndexer 58.3% similar

A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.
From: /tf/active/vicechatdev/docchat/document_indexer.py

🔍 Code Extractor

class MyEmbeddingFunction

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(self, model_name: str, embedding_model: str, api_key: str)`

`call(self, input: Documents) -> Embeddings`

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

class MyEmbeddingFunction_v1 87.0% similar

class DocChatEmbeddingFunction 82.8% similar

class MyEmbeddingFunction_v2 80.5% similar

class MyEmbeddingFunction_v3 78.5% similar

class DocumentIndexer 58.3% similar

class MyEmbeddingFunction

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(self, model_name: str, embedding_model: str, api_key: str)

__call__(self, input: Documents) -> Embeddings

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

class MyEmbeddingFunction_v1 87.0% similar

class DocChatEmbeddingFunction 82.8% similar

class MyEmbeddingFunction_v2 80.5% similar

class MyEmbeddingFunction_v3 78.5% similar

class DocumentIndexer 58.3% similar

✨ Improve Code: MyEmbeddingFunction

Code Comparison

`init(self, model_name: str, embedding_model: str, api_key: str)`

`call(self, input: Documents) -> Embeddings`