class OpenAIResponsesLLM
Adapter class for OpenAI's Responses API, specifically designed for GPT-5 family models with automatic fallback mechanisms to stable models when responses fail.
/tf/active/vicechatdev/docchat/llm_factory.py
22 - 72
moderate
Purpose
This class provides a robust interface to OpenAI's Responses API with multiple fallback strategies. It first attempts to use the Responses API (required for GPT-5 models), then falls back to Chat Completions API with special parameters, and finally routes to a stable GPT-4o model if all else fails. This ensures reliable LLM responses even when newer APIs have issues.
Source Code
class OpenAIResponsesLLM:
"""Adapter using OpenAI Responses API (required/ideal for GPT-5 family)."""
def __init__(self, model: str, api_key: Optional[str] = None, max_output_tokens: int = 4096, fallback_model: str = 'gpt-4o'):
from openai import OpenAI # lazy import
self.client = OpenAI(api_key=api_key)
self.model = model
self.max_output_tokens = max_output_tokens
self.fallback_model = fallback_model
@property
def model_name(self) -> str:
return self.model
def invoke(self, prompt: str) -> LLMMessage:
# Primary: Responses API (text input)
resp = self.client.responses.create(
model=self.model,
input=prompt,
max_output_tokens=self.max_output_tokens,
)
content = getattr(resp, 'output_text', None)
if not content:
parts = []
for item in getattr(resp, 'output', []) or []:
if getattr(item, 'type', '') == 'message':
for c in getattr(item, 'content', []) or []:
if getattr(c, 'type', '') == 'output_text':
parts.append(getattr(c, 'text', '') or '')
content = ''.join(parts)
# Fallback: try Chat Completions with extra_body override
if not content:
try:
cc = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
# Omit temperature for GPT-5 (default only)
extra_body={"max_completion_tokens": self.max_output_tokens},
)
content = (cc.choices[0].message.content or '')
except Exception as e:
logger.warning(f"Fallback to chat.completions failed: {e}")
# Last resort: route to stable GPT-4o
if not content and self.fallback_model:
logger.info(f"Responses returned empty. Falling back to {self.fallback_model}.")
backup = OpenAIChatLLM(model=self.fallback_model, api_key=self.client.api_key, temperature=0, max_tokens=self.max_output_tokens)
content = backup.invoke(prompt).content
return LLMMessage(content=(content or '').strip())
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
model: The OpenAI model identifier to use (e.g., 'gpt-5', 'gpt-5-turbo'). This should be a model that supports the Responses API.
api_key: Optional OpenAI API key. If None, the OpenAI client will attempt to use the OPENAI_API_KEY environment variable.
max_output_tokens: Maximum number of tokens to generate in the response. Default is 4096. Controls the length of generated text.
fallback_model: Model to use as a last resort if all other methods fail. Default is 'gpt-4o', which is a stable, reliable model.
Return Value
The __init__ method returns an instance of OpenAIResponsesLLM. The invoke() method returns an LLMMessage object containing the generated text content. The model_name property returns a string with the primary model identifier.
Class Interface
Methods
__init__(self, model: str, api_key: Optional[str] = None, max_output_tokens: int = 4096, fallback_model: str = 'gpt-4o')
Purpose: Initializes the OpenAI Responses API adapter with configuration parameters and creates an OpenAI client instance
Parameters:
model: The primary OpenAI model to use (e.g., 'gpt-5')api_key: Optional API key; if None, uses OPENAI_API_KEY environment variablemax_output_tokens: Maximum tokens to generate (default 4096)fallback_model: Backup model to use if primary fails (default 'gpt-4o')
Returns: None (constructor)
@property model_name(self) -> str
property
Purpose: Returns the name of the primary model being used
Returns: String containing the model identifier (e.g., 'gpt-5')
invoke(self, prompt: str) -> LLMMessage
Purpose: Sends a prompt to the LLM and returns the generated response, with automatic fallback handling if the primary method fails
Parameters:
prompt: The text prompt/question to send to the language model
Returns: LLMMessage object containing the generated text content in its 'content' attribute
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
client |
OpenAI | The OpenAI client instance used to make API calls | instance |
model |
str | The primary model identifier to use for generation | instance |
max_output_tokens |
int | Maximum number of tokens to generate in responses | instance |
fallback_model |
str | The backup model identifier to use when primary methods fail | instance |
Dependencies
openailoggingtypingdataclasses
Required Imports
from typing import Optional
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
from openai import OpenAI
Condition: Required for instantiation and API calls, lazily imported in __init__
Required (conditional)from dataclasses import dataclass
Condition: Required if LLMMessage is a dataclass defined elsewhere in the codebase
Required (conditional)Usage Example
# Basic usage
from openai_responses_llm import OpenAIResponsesLLM
# Instantiate with API key
llm = OpenAIResponsesLLM(
model='gpt-5',
api_key='your-api-key-here',
max_output_tokens=2048,
fallback_model='gpt-4o'
)
# Get model name
print(llm.model_name) # Output: 'gpt-5'
# Invoke the model with a prompt
prompt = 'Explain quantum computing in simple terms.'
response = llm.invoke(prompt)
print(response.content)
# Using environment variable for API key
import os
os.environ['OPENAI_API_KEY'] = 'your-api-key'
llm_auto = OpenAIResponsesLLM(model='gpt-5')
response = llm_auto.invoke('What is machine learning?')
Best Practices
- Always provide an API key either through the constructor or OPENAI_API_KEY environment variable
- The class implements a three-tier fallback strategy: Responses API → Chat Completions API → Stable fallback model
- Set max_output_tokens appropriately based on your use case to control costs and response length
- The fallback_model should be a stable, well-tested model (default gpt-4o is recommended)
- Monitor logs for fallback warnings to detect when the primary Responses API is failing
- The invoke() method is synchronous and will block until a response is received
- Empty responses trigger automatic fallback mechanisms, ensuring you always get content
- The class creates a new OpenAI client instance on initialization, so reuse the same instance for multiple invocations
- Temperature is intentionally omitted for GPT-5 models (uses default only)
- The class depends on external LLMMessage and OpenAIChatLLM classes that must be available in the module
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class OpenAIChatLLM 80.1% similar
-
class AzureOpenAIChatLLM 69.3% similar
-
class GPT5Validator 68.5% similar
-
class LLMClient_v1 67.0% similar
-
class LLMClient_v1 64.9% similar