🔍 Code Extractor

function extract_previous_reports_summary

Maturity: 51

Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

File:
/tf/active/vicechatdev/leexi/app.py
Lines:
51 - 128
Complexity:
complex

Purpose

This function processes multiple previous meeting report files (various formats), extracts their text content, and uses an LLM to generate a structured summary focusing on action items, decisions, ongoing projects, stakeholders, issues, and deadlines. The summary provides continuity and context for new meetings by highlighting relevant information from past discussions.

Source Code

def extract_previous_reports_summary(file_paths):
    """Extract key information from previous meeting reports using document extractor and LLM"""
    if not file_paths:
        return ""
    
    try:
        # Use a lightweight model for preprocessing
        import openai
        client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
        
        combined_content = []
        
        # Extract content from each file using the document extractor
        for file_path in file_paths:
            try:
                logger.info(f"Extracting content from: {file_path}")
                
                # Use document extractor to get text content
                extracted_text = doc_extractor.extract_text(file_path)
                
                if extracted_text:
                    file_name = Path(file_path).name
                    combined_content.append(f"=== {file_name} ===\n{extracted_text}\n")
                else:
                    logger.warning(f"No text extracted from: {file_path}")
                    
            except Exception as file_error:
                logger.error(f"Error processing file {file_path}: {str(file_error)}")
                # Try to read as plain text as fallback
                try:
                    with open(file_path, 'r', encoding='utf-8') as f:
                        content = f.read()
                        combined_content.append(f"=== {Path(file_path).name} ===\n{content}\n")
                except Exception as fallback_error:
                    logger.error(f"Fallback text extraction also failed for {file_path}: {str(fallback_error)}")
        
        if not combined_content:
            return "No content could be extracted from previous reports."
        
        full_content = "\n".join(combined_content)
        
        # Limit content to avoid token limits (roughly 8000 characters = ~2000 tokens)
        if len(full_content) > 8000:
            full_content = full_content[:8000] + "\n... (content truncated)"
        
        # Create preprocessing prompt
        preprocessing_prompt = f"""You are an expert meeting analyst. Extract key information from the following previous meeting reports to provide context for a new meeting.

Focus on extracting:
1. Outstanding action items and their current status
2. Previous decisions that may impact current discussions
3. Ongoing projects and their timelines
4. Key stakeholders and their roles
5. Critical issues requiring follow-up
6. Important dates and deadlines

Provide a concise summary (max 800 words) that will help contextualize a new meeting discussion.

Previous Meeting Reports:
{full_content}

Generate a structured summary focusing on continuity and context for the upcoming meeting."""

        response = client.chat.completions.create(
            model="gpt-4o-mini",  # Use smaller model for preprocessing
            messages=[
                {"role": "system", "content": "You are an expert meeting analyst who extracts key contextual information from previous meeting reports."},
                {"role": "user", "content": preprocessing_prompt}
            ],
            max_tokens=1000,
            temperature=0.2
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        logger.error(f"Error extracting previous reports summary: {str(e)}")
        return f"Error processing previous reports: {str(e)}"

Parameters

Name Type Default Kind
file_paths - - positional_or_keyword

Parameter Details

file_paths: A list of file path strings pointing to previous meeting report documents. Can be empty/None. Supports various document formats that the DocumentExtractor can handle (PDF, DOCX, TXT, etc.). If empty or None, returns an empty string immediately.

Return Value

Returns a string containing either: (1) A structured summary (max 800 words) of key information from previous reports including action items, decisions, projects, stakeholders, issues, and deadlines, (2) An empty string if no file_paths provided, (3) 'No content could be extracted from previous reports.' if extraction fails for all files, or (4) An error message string starting with 'Error processing previous reports:' if an exception occurs during processing.

Dependencies

  • openai
  • pathlib
  • os
  • logging

Required Imports

import os
from pathlib import Path

Conditional/Optional Imports

These imports are only needed under specific conditions:

import openai

Condition: Required when file_paths is not empty and function needs to call OpenAI API

Required (conditional)

Usage Example

import os
from pathlib import Path
import logging
from document_extractor import DocumentExtractor

# Setup
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
doc_extractor = DocumentExtractor()

# Use the function
file_paths = [
    '/path/to/meeting_report_1.pdf',
    '/path/to/meeting_report_2.docx',
    '/path/to/meeting_notes.txt'
]

summary = extract_previous_reports_summary(file_paths)
print(summary)

# Handle empty case
empty_summary = extract_previous_reports_summary([])
print(empty_summary)  # Returns empty string

Best Practices

  • Ensure the OPENAI_API_KEY environment variable is set before calling this function
  • Initialize the global 'doc_extractor' (DocumentExtractor instance) and 'logger' objects before use
  • Be aware that content is truncated at 8000 characters to avoid token limits
  • The function uses gpt-4o-mini model which incurs API costs per call
  • Handle the return value appropriately - check for error messages starting with 'Error processing'
  • The function has fallback mechanisms: if DocumentExtractor fails, it attempts plain text reading
  • Consider rate limiting if processing many file sets in quick succession due to OpenAI API limits
  • File paths should be absolute or relative paths that are accessible from the execution context
  • The function is designed for meeting reports but can work with any text-based documents
  • Temperature is set to 0.2 for more deterministic outputs suitable for factual extraction

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_mixed_previous_reports 68.6% similar

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.

    From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
  • class MeetingMinutesGenerator 64.0% similar

    A class that generates professional meeting minutes from meeting transcripts using OpenAI's GPT-4o model, with capabilities to parse metadata, extract action items, and format output.

    From: /tf/active/vicechatdev/meeting_minutes_generator.py
  • class MeetingMinutesGenerator_v1 63.1% similar

    A class that generates professional meeting minutes from meeting transcripts using either OpenAI's GPT-4o or Google's Gemini AI models.

    From: /tf/active/vicechatdev/advanced_meeting_minutes_generator.py
  • function main_v27 62.3% similar

    Entry point function that orchestrates the process of loading a meeting transcript, generating structured meeting minutes using OpenAI's GPT-4o API, and saving the output to a file.

    From: /tf/active/vicechatdev/meeting_minutes_generator.py
  • function main_v14 61.0% similar

    Command-line interface function that orchestrates the generation of meeting minutes from a transcript file using either GPT-4o or Gemini LLM models.

    From: /tf/active/vicechatdev/advanced_meeting_minutes_generator.py
← Back to Browse