function extract_previous_reports_summary
Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.
/tf/active/vicechatdev/leexi/app.py
51 - 128
complex
Purpose
This function processes multiple previous meeting report files (various formats), extracts their text content, and uses an LLM to generate a structured summary focusing on action items, decisions, ongoing projects, stakeholders, issues, and deadlines. The summary provides continuity and context for new meetings by highlighting relevant information from past discussions.
Source Code
def extract_previous_reports_summary(file_paths):
"""Extract key information from previous meeting reports using document extractor and LLM"""
if not file_paths:
return ""
try:
# Use a lightweight model for preprocessing
import openai
client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
combined_content = []
# Extract content from each file using the document extractor
for file_path in file_paths:
try:
logger.info(f"Extracting content from: {file_path}")
# Use document extractor to get text content
extracted_text = doc_extractor.extract_text(file_path)
if extracted_text:
file_name = Path(file_path).name
combined_content.append(f"=== {file_name} ===\n{extracted_text}\n")
else:
logger.warning(f"No text extracted from: {file_path}")
except Exception as file_error:
logger.error(f"Error processing file {file_path}: {str(file_error)}")
# Try to read as plain text as fallback
try:
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
combined_content.append(f"=== {Path(file_path).name} ===\n{content}\n")
except Exception as fallback_error:
logger.error(f"Fallback text extraction also failed for {file_path}: {str(fallback_error)}")
if not combined_content:
return "No content could be extracted from previous reports."
full_content = "\n".join(combined_content)
# Limit content to avoid token limits (roughly 8000 characters = ~2000 tokens)
if len(full_content) > 8000:
full_content = full_content[:8000] + "\n... (content truncated)"
# Create preprocessing prompt
preprocessing_prompt = f"""You are an expert meeting analyst. Extract key information from the following previous meeting reports to provide context for a new meeting.
Focus on extracting:
1. Outstanding action items and their current status
2. Previous decisions that may impact current discussions
3. Ongoing projects and their timelines
4. Key stakeholders and their roles
5. Critical issues requiring follow-up
6. Important dates and deadlines
Provide a concise summary (max 800 words) that will help contextualize a new meeting discussion.
Previous Meeting Reports:
{full_content}
Generate a structured summary focusing on continuity and context for the upcoming meeting."""
response = client.chat.completions.create(
model="gpt-4o-mini", # Use smaller model for preprocessing
messages=[
{"role": "system", "content": "You are an expert meeting analyst who extracts key contextual information from previous meeting reports."},
{"role": "user", "content": preprocessing_prompt}
],
max_tokens=1000,
temperature=0.2
)
return response.choices[0].message.content
except Exception as e:
logger.error(f"Error extracting previous reports summary: {str(e)}")
return f"Error processing previous reports: {str(e)}"
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
file_paths |
- | - | positional_or_keyword |
Parameter Details
file_paths: A list of file path strings pointing to previous meeting report documents. Can be empty/None. Supports various document formats that the DocumentExtractor can handle (PDF, DOCX, TXT, etc.). If empty or None, returns an empty string immediately.
Return Value
Returns a string containing either: (1) A structured summary (max 800 words) of key information from previous reports including action items, decisions, projects, stakeholders, issues, and deadlines, (2) An empty string if no file_paths provided, (3) 'No content could be extracted from previous reports.' if extraction fails for all files, or (4) An error message string starting with 'Error processing previous reports:' if an exception occurs during processing.
Dependencies
openaipathliboslogging
Required Imports
import os
from pathlib import Path
Conditional/Optional Imports
These imports are only needed under specific conditions:
import openai
Condition: Required when file_paths is not empty and function needs to call OpenAI API
Required (conditional)Usage Example
import os
from pathlib import Path
import logging
from document_extractor import DocumentExtractor
# Setup
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
doc_extractor = DocumentExtractor()
# Use the function
file_paths = [
'/path/to/meeting_report_1.pdf',
'/path/to/meeting_report_2.docx',
'/path/to/meeting_notes.txt'
]
summary = extract_previous_reports_summary(file_paths)
print(summary)
# Handle empty case
empty_summary = extract_previous_reports_summary([])
print(empty_summary) # Returns empty string
Best Practices
- Ensure the OPENAI_API_KEY environment variable is set before calling this function
- Initialize the global 'doc_extractor' (DocumentExtractor instance) and 'logger' objects before use
- Be aware that content is truncated at 8000 characters to avoid token limits
- The function uses gpt-4o-mini model which incurs API costs per call
- Handle the return value appropriately - check for error messages starting with 'Error processing'
- The function has fallback mechanisms: if DocumentExtractor fails, it attempts plain text reading
- Consider rate limiting if processing many file sets in quick succession due to OpenAI API limits
- File paths should be absolute or relative paths that are accessible from the execution context
- The function is designed for meeting reports but can work with any text-based documents
- Temperature is set to 0.2 for more deterministic outputs suitable for factual extraction
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_mixed_previous_reports 68.6% similar
-
class MeetingMinutesGenerator 64.0% similar
-
class MeetingMinutesGenerator_v1 63.1% similar
-
function main_v27 62.3% similar
-
function main_v14 61.0% similar