class MyEmbeddingFunction
Custom embedding function class that integrates OpenAI's embedding API with Chroma DB for generating vector embeddings from text documents.
/tf/active/vicechatdev/project_victoria_disclosure_generator.py
819 - 856
moderate
Purpose
This class serves as an adapter between Chroma DB's EmbeddingFunction interface and OpenAI's embedding API. It enables Chroma DB to use OpenAI's embedding models (like text-embedding-3-small) for converting text documents into vector representations. The class handles API authentication, embedding generation, and error fallback scenarios. It's designed to be used as a custom embedding function when initializing Chroma collections.
Source Code
class MyEmbeddingFunction(EmbeddingFunction):
"""
Custom embedding function for Chroma DB using OpenAI embeddings.
"""
def __init__(self, model_name: str, embedding_model: str, api_key: str):
self.model_name = model_name
self.embedding_model = embedding_model
self.api_key = api_key
# Set up OpenAI client
os.environ["OPENAI_API_KEY"] = api_key
from openai import OpenAI
self.client = OpenAI(api_key=api_key)
def __call__(self, input: Documents) -> Embeddings:
"""
Generate embeddings for input documents.
Args:
input: List of document texts
Returns:
List of embedding vectors
"""
try:
response = self.client.embeddings.create(
input=input,
model=self.embedding_model
)
embeddings = [data.embedding for data in response.data]
return embeddings
except Exception as e:
print(f"Error generating embeddings: {e}")
# Return zero embeddings as fallback
return [[0.0] * 1536 for _ in input] # 1536 is dimension for text-embedding-3-small
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
EmbeddingFunction | - |
Parameter Details
model_name: Name identifier for the model being used. This parameter is stored but not actively used in the current implementation - it appears to be for tracking or logging purposes.
embedding_model: The specific OpenAI embedding model to use (e.g., 'text-embedding-3-small', 'text-embedding-ada-002'). This determines the embedding dimensions and quality.
api_key: OpenAI API key for authentication. This key is used to initialize the OpenAI client and is also set as an environment variable.
Return Value
Instantiation returns a MyEmbeddingFunction object that can be called as a function. When called (via __call__), it returns a list of embedding vectors (Embeddings type), where each embedding is a list of floats representing the vector for the corresponding input document. On error, returns zero-filled vectors with dimension 1536.
Class Interface
Methods
__init__(self, model_name: str, embedding_model: str, api_key: str)
Purpose: Initializes the embedding function with OpenAI credentials and model configuration
Parameters:
model_name: Name identifier for the model (stored but not actively used)embedding_model: OpenAI embedding model name (e.g., 'text-embedding-3-small')api_key: OpenAI API key for authentication
Returns: None (constructor)
__call__(self, input: Documents) -> Embeddings
Purpose: Generates embedding vectors for the provided input documents using OpenAI's API
Parameters:
input: List of document texts (strings) to generate embeddings for
Returns: List of embedding vectors, where each vector is a list of floats. Returns zero-filled vectors (dimension 1536) if an error occurs.
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
model_name |
str | Stores the model name identifier passed during initialization | instance |
embedding_model |
str | The OpenAI embedding model name used for generating embeddings | instance |
api_key |
str | The OpenAI API key used for authentication | instance |
client |
OpenAI | OpenAI client instance used to make API calls for embedding generation | instance |
Dependencies
osopenaichromadb
Required Imports
import os
from chromadb import Documents
from chromadb import EmbeddingFunction
from chromadb import Embeddings
Conditional/Optional Imports
These imports are only needed under specific conditions:
from openai import OpenAI
Condition: imported inside __init__ method when the class is instantiated
Required (conditional)Usage Example
# Initialize the embedding function
api_key = 'your-openai-api-key'
embedding_fn = MyEmbeddingFunction(
model_name='my-model',
embedding_model='text-embedding-3-small',
api_key=api_key
)
# Use with Chroma DB
import chromadb
client = chromadb.Client()
collection = client.create_collection(
name='my_collection',
embedding_function=embedding_fn
)
# Or call directly to generate embeddings
documents = ['Hello world', 'Another document']
embeddings = embedding_fn(documents)
print(f'Generated {len(embeddings)} embeddings')
print(f'Embedding dimension: {len(embeddings[0])}')
Best Practices
- Always provide a valid OpenAI API key to avoid authentication errors
- The class modifies the global environment variable OPENAI_API_KEY, which may affect other parts of your application
- Error handling returns zero-filled embeddings (1536 dimensions) as fallback - ensure your application can handle these gracefully
- The hardcoded dimension of 1536 in the error handler is specific to text-embedding-3-small; if using different models, this may need adjustment
- The model_name parameter is stored but unused - consider removing it or implementing logging/tracking functionality
- This class is designed to be instantiated once and reused for multiple embedding operations
- The __call__ method makes instances callable, allowing them to be used directly as functions
- Consider implementing retry logic for transient API failures instead of immediately falling back to zero embeddings
- The class creates a new OpenAI client on each instantiation - avoid creating multiple instances unnecessarily
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class MyEmbeddingFunction_v1 87.0% similar
-
class DocChatEmbeddingFunction 82.8% similar
-
class MyEmbeddingFunction_v2 80.5% similar
-
class MyEmbeddingFunction_v3 78.5% similar
-
class DocumentIndexer 58.3% similar