function test_attendee_extraction_comprehensive
A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.
/tf/active/vicechatdev/leexi/test_attendee_comprehensive.py
12 - 104
moderate
Purpose
This test function validates the EnhancedMeetingMinutesGenerator's ability to correctly identify actual meeting attendees (speakers) from a transcript while filtering out people merely mentioned in conversation, generic speaker labels, and meeting room systems. It provides detailed analysis of speaker patterns, frequency counts, and demonstrates the integration of attendee extraction with the full meeting minutes generation pipeline.
Source Code
def test_attendee_extraction_comprehensive():
"""Test the improved attendee extraction with before/after comparison"""
print("๐งช COMPREHENSIVE ATTENDEE EXTRACTION TEST")
print("=" * 50)
# Initialize generator
generator = EnhancedMeetingMinutesGenerator(model='gpt-4o')
# Test with actual transcript
transcript_path = '/tf/active/leexi/leexi-20250618-transcript-development_team_meeting.md'
try:
# Load transcript
with open(transcript_path, 'r', encoding='utf-8') as f:
transcript = f.read()
# Extract metadata with improved logic
metadata = generator.parse_transcript_metadata(transcript)
print(f"๐ Transcript: {transcript_path}")
print(f"๐
Meeting Date: {metadata['date']}")
print(f"โฑ๏ธ Duration: {metadata['duration']}")
print(f"๐ฅ Actual Speakers Found: {len(metadata['speakers'])}")
print("\nโ
IMPROVED ATTENDEE LIST:")
for i, speaker in enumerate(metadata['speakers'], 1):
print(f" {i}. {speaker}")
# Show what was EXCLUDED (mentioned but not speakers)
print("\n๐ซ CORRECTLY EXCLUDED (mentioned in conversation but not actual speakers):")
excluded_names = [
"Jean", "Koen", "Julie", "Vincent", "Javier", "Juana", "Mike",
"Pascal", "Manu", "Frank", "Daniel", "Ksenia", "Wim", "Morgan"
]
for name in excluded_names:
if name in transcript:
print(f" - {name} (mentioned in conversation)")
# Show analysis of speaking patterns
print("\n๐ SPEAKING PATTERN ANALYSIS:")
speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
import re
speaker_counts = {}
for line in transcript.split('\n'):
line = line.strip()
if not line:
continue
match = re.match(speaker_pattern, line)
if match:
speaker = match.group(1).strip()
if speaker and not re.match(r'^Speaker \d+$', speaker):
speaker_counts[speaker] = speaker_counts.get(speaker, 0) + 1
print(" Speaker frequencies:")
for speaker, count in sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True):
status = "โ
INCLUDED" if speaker in metadata['speakers'] else "๐ซ EXCLUDED"
print(f" {speaker}: {count} times - {status}")
print("\n๐ก IMPROVEMENT SUMMARY:")
print(" - Only actual speakers are included as attendees")
print(" - People mentioned in conversation are correctly excluded")
print(" - Generic speakers (Speaker 1, Speaker 2) are filtered out")
print(" - Meeting room systems are filtered out")
print(" - Frequency analysis ensures consistent speakers")
# Test with a small sample generation
print("\n๐ฏ TESTING INTEGRATION WITH MEETING MINUTES GENERATION...")
# Generate a brief summary to test attendee integration
brief_minutes = generator.generate_meeting_minutes_with_config(
transcript=transcript[:2000], # Use first 2000 chars for speed
meeting_title="Test Meeting - Attendee Extraction",
detail_level="concise",
rigor_level="standard",
action_focus="standard",
output_style="professional"
)
# Extract attendee line from generated minutes
for line in brief_minutes.split('\n'):
if 'Attendees:' in line:
print(f" Generated attendees: {line.strip()}")
break
print("\nโ
ATTENDEE EXTRACTION TEST COMPLETED SUCCESSFULLY!")
except Exception as e:
print(f"โ Error: {e}")
import traceback
traceback.print_exc()
Return Value
This function does not return any value (implicitly returns None). It outputs comprehensive test results to stdout, including extracted attendees, excluded names, speaking pattern analysis, and integration test results. The function may raise exceptions if file reading or processing fails.
Dependencies
enhanced_meeting_minutes_generatorsysjsonretraceback
Required Imports
import sys
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import json
import re
import traceback
Usage Example
# Ensure the transcript file exists and API key is set
import os
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
# Import and run the test
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import sys
import json
import re
import traceback
# Run the comprehensive test
test_attendee_extraction_comprehensive()
# Expected output includes:
# - Number of speakers found
# - List of included attendees
# - List of correctly excluded names
# - Speaking pattern analysis with frequencies
# - Integration test with generated meeting minutes
Best Practices
- Ensure the transcript file path is accessible before running the test
- Set up proper API credentials for the EnhancedMeetingMinutesGenerator before execution
- The test uses only the first 2000 characters of the transcript for the integration test to optimize speed
- Review the console output carefully to understand the distinction between actual speakers and mentioned names
- The function includes comprehensive error handling with traceback printing for debugging
- The test validates both the extraction logic and its integration with the full meeting minutes generation pipeline
- Speaker pattern matching uses regex to identify timestamp-based speaker entries in the transcript format
- The function filters out generic speaker labels (e.g., 'Speaker 1', 'Speaker 2') and meeting room systems automatically
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_attendee_extraction 87.3% similar
-
function test_mixed_previous_reports 64.6% similar
-
function test_llm_extraction 59.4% similar
-
function extract_previous_reports_summary 56.9% similar
-
function handle_potential_truncation 56.9% similar