🔍 Code Extractor

function create_summary

Maturity: 58

Creates a text summary using OpenAI's GPT models or returns a truncated version as fallback when API key is unavailable.

File:
/tf/active/vicechatdev/chromadb-cleanup/src/summarization/summarizer.py
Lines:
38 - 79
Complexity:
moderate

Purpose

This function provides text summarization capabilities with configurable length limits. It primarily uses OpenAI's ChatCompletion API with GPT models to generate concise summaries. When no API key is provided, it gracefully degrades to simple text truncation for development/testing. The function includes error handling and supports future extensibility for other summarization models.

Source Code

def create_summary(text: str, config: Config) -> str:
    """
    Create a summary of the given text using the specified model.
    
    Args:
        text: Text to summarize
        config: Configuration object
        
    Returns:
        Summarized text
    """
    max_length = config.max_summary_length
    
    # For testing/development without API, just return a truncated version
    if not config.openai_api_key:
        print("WARNING: No OpenAI API key provided, returning truncated text as summary")
        return text[:max_length] + "..." if len(text) > max_length else text
    
    if config.summary_model.startswith("gpt"):
        # Use OpenAI API for summarization
        try:
            init_openai_client(config.openai_api_key)
            prompt = f"Please summarize the following text in about {max_length} words:\n\n{text}"
            
            response = openai.ChatCompletion.create(
                model=config.summary_model,
                messages=[
                    {"role": "system", "content": "You are a helpful assistant that summarizes documents."},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=int(max_length * 1.5),  # Allow some buffer
                temperature=0.3  # Lower temperature for more focused summaries
            )
            
            return response.choices[0].message["content"].strip()
        except Exception as e:
            print(f"Error in summarization: {e}")
            return text[:max_length] + "..." if len(text) > max_length else text
    else:
        # Add support for other summarization models (e.g., Hugging Face models)
        # This is a placeholder for future implementation
        raise NotImplementedError(f"Summarization using model {config.summary_model} is not implemented yet.")

Parameters

Name Type Default Kind
text str - positional_or_keyword
config Config - positional_or_keyword

Parameter Details

text: The input text string to be summarized. Can be of any length, though very long texts may be truncated or chunked depending on model token limits. No explicit constraints on content or format.

config: A Config object containing configuration settings including: openai_api_key (str or None), summary_model (str, e.g., 'gpt-3.5-turbo', 'gpt-4'), and max_summary_length (int, target length in words for the summary). The Config object must have these attributes properly initialized.

Return Value

Type: str

Returns a string containing the summarized text. If using OpenAI API successfully, returns an AI-generated summary approximately matching max_summary_length words. If API key is missing or an error occurs, returns the original text truncated to max_summary_length characters with '...' appended (or the full text if shorter than max_length). The return value is always a non-empty string.

Dependencies

  • openai
  • typing

Required Imports

from typing import List
from typing import Dict
from typing import Any
import os
import openai
from src.config import Config

Conditional/Optional Imports

These imports are only needed under specific conditions:

import openai

Condition: required when config.openai_api_key is provided and config.summary_model starts with 'gpt'

Required (conditional)

Usage Example

from src.config import Config
import openai

# Assuming init_openai_client is defined elsewhere
def init_openai_client(api_key):
    openai.api_key = api_key

# Create configuration
config = Config()
config.openai_api_key = 'sk-your-api-key-here'
config.summary_model = 'gpt-3.5-turbo'
config.max_summary_length = 100

# Text to summarize
long_text = '''Your long document text here that needs to be summarized.
This could be multiple paragraphs of content that you want condensed
into a shorter, more digestible format.'''

# Generate summary
summary = create_summary(long_text, config)
print(f"Summary: {summary}")

# Example without API key (fallback mode)
config_no_key = Config()
config_no_key.openai_api_key = None
config_no_key.max_summary_length = 50
truncated = create_summary(long_text, config_no_key)
print(f"Truncated: {truncated}")

Best Practices

  • Always provide a valid OpenAI API key in the Config object for production use; the truncation fallback is only suitable for development/testing
  • Set max_summary_length appropriately based on your use case; the function uses 1.5x this value for max_tokens to allow buffer space
  • Handle the case where the function returns truncated text (when API fails) by checking if the returned text ends with '...'
  • Be aware that OpenAI API calls have associated costs; monitor usage especially with large volumes of text
  • The function uses temperature=0.3 for consistent, focused summaries; this is not configurable in the current implementation
  • For non-GPT models, the function raises NotImplementedError; ensure config.summary_model starts with 'gpt' or implement additional model support
  • Consider implementing retry logic or rate limiting for production environments to handle API failures gracefully
  • The init_openai_client function must be defined in the same scope or imported; ensure this dependency is available

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class SummarizationConfig 62.3% similar

    A configuration wrapper class that manages settings for a text summarization model by encapsulating a SummarizationModel instance.

    From: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/models.py
  • class SummarizationModel 56.7% similar

    A Pydantic data model class that defines the configuration schema for a text summarization model, including model name, token limits, and temperature settings.

    From: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/models.py
  • function summarize_text 56.6% similar

    A deprecated standalone function that was originally designed to summarize groups of similar documents but now only returns the input documents unchanged with a deprecation warning.

    From: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/summarizer.py
  • function extract_previous_reports_summary 48.1% similar

    Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

    From: /tf/active/vicechatdev/leexi/app.py
  • class MyEmbeddingFunction_v2 47.3% similar

    A custom embedding function class that generates embeddings for text documents using OpenAI's embedding models, with automatic text summarization and token management for large documents.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
← Back to Browse