test_attendee_extraction_comprehensive

function test_attendee_extraction_comprehensive

Maturity: 47

A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

File:
/tf/active/vicechatdev/leexi/test_attendee_comprehensive.py

Lines:
12 - 104

Complexity:
moderate

Purpose

This test function validates the EnhancedMeetingMinutesGenerator's ability to correctly identify actual meeting attendees (speakers) from a transcript while filtering out people merely mentioned in conversation, generic speaker labels, and meeting room systems. It provides detailed analysis of speaker patterns, frequency counts, and demonstrates the integration of attendee extraction with the full meeting minutes generation pipeline.

Source Code

def test_attendee_extraction_comprehensive():
    """Test the improved attendee extraction with before/after comparison"""
    
    print("🧪 COMPREHENSIVE ATTENDEE EXTRACTION TEST")
    print("=" * 50)
    
    # Initialize generator
    generator = EnhancedMeetingMinutesGenerator(model='gpt-4o')
    
    # Test with actual transcript
    transcript_path = '/tf/active/leexi/leexi-20250618-transcript-development_team_meeting.md'
    
    try:
        # Load transcript
        with open(transcript_path, 'r', encoding='utf-8') as f:
            transcript = f.read()
        
        # Extract metadata with improved logic
        metadata = generator.parse_transcript_metadata(transcript)
        
        print(f"📄 Transcript: {transcript_path}")
        print(f"📅 Meeting Date: {metadata['date']}")
        print(f"⏱️ Duration: {metadata['duration']}")
        print(f"👥 Actual Speakers Found: {len(metadata['speakers'])}")
        
        print("\n✅ IMPROVED ATTENDEE LIST:")
        for i, speaker in enumerate(metadata['speakers'], 1):
            print(f"  {i}. {speaker}")
        
        # Show what was EXCLUDED (mentioned but not speakers)
        print("\n🚫 CORRECTLY EXCLUDED (mentioned in conversation but not actual speakers):")
        excluded_names = [
            "Jean", "Koen", "Julie", "Vincent", "Javier", "Juana", "Mike", 
            "Pascal", "Manu", "Frank", "Daniel", "Ksenia", "Wim", "Morgan"
        ]
        
        for name in excluded_names:
            if name in transcript:
                print(f"  - {name} (mentioned in conversation)")
        
        # Show analysis of speaking patterns
        print("\n📊 SPEAKING PATTERN ANALYSIS:")
        speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
        import re
        speaker_counts = {}
        
        for line in transcript.split('\n'):
            line = line.strip()
            if not line:
                continue
            match = re.match(speaker_pattern, line)
            if match:
                speaker = match.group(1).strip()
                if speaker and not re.match(r'^Speaker \d+$', speaker):
                    speaker_counts[speaker] = speaker_counts.get(speaker, 0) + 1
        
        print("  Speaker frequencies:")
        for speaker, count in sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True):
            status = "✅ INCLUDED" if speaker in metadata['speakers'] else "🚫 EXCLUDED"
            print(f"    {speaker}: {count} times - {status}")
        
        print("\n💡 IMPROVEMENT SUMMARY:")
        print("  - Only actual speakers are included as attendees")
        print("  - People mentioned in conversation are correctly excluded")
        print("  - Generic speakers (Speaker 1, Speaker 2) are filtered out")
        print("  - Meeting room systems are filtered out")
        print("  - Frequency analysis ensures consistent speakers")
        
        # Test with a small sample generation
        print("\n🎯 TESTING INTEGRATION WITH MEETING MINUTES GENERATION...")
        
        # Generate a brief summary to test attendee integration
        brief_minutes = generator.generate_meeting_minutes_with_config(
            transcript=transcript[:2000],  # Use first 2000 chars for speed
            meeting_title="Test Meeting - Attendee Extraction",
            detail_level="concise",
            rigor_level="standard",
            action_focus="standard",
            output_style="professional"
        )
        
        # Extract attendee line from generated minutes
        for line in brief_minutes.split('\n'):
            if 'Attendees:' in line:
                print(f"  Generated attendees: {line.strip()}")
                break
        
        print("\n✅ ATTENDEE EXTRACTION TEST COMPLETED SUCCESSFULLY!")
        
    except Exception as e:
        print(f"❌ Error: {e}")
        import traceback
        traceback.print_exc()

Return Value

This function does not return any value (implicitly returns None). It outputs comprehensive test results to stdout, including extracted attendees, excluded names, speaking pattern analysis, and integration test results. The function may raise exceptions if file reading or processing fails.

Dependencies

enhanced_meeting_minutes_generator
sys
json
re
traceback

Required Imports

import sys
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import json
import re
import traceback

Usage Example

# Ensure the transcript file exists and API key is set
import os
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

# Import and run the test
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import sys
import json
import re
import traceback

# Run the comprehensive test
test_attendee_extraction_comprehensive()

# Expected output includes:
# - Number of speakers found
# - List of included attendees
# - List of correctly excluded names
# - Speaking pattern analysis with frequencies
# - Integration test with generated meeting minutes

Best Practices

Ensure the transcript file path is accessible before running the test
Set up proper API credentials for the EnhancedMeetingMinutesGenerator before execution
The test uses only the first 2000 characters of the transcript for the integration test to optimize speed
Review the console output carefully to understand the distinction between actual speakers and mentioned names
The function includes comprehensive error handling with traceback printing for debugging
The test validates both the extraction logic and its integration with the full meeting minutes generation pipeline
Speaker pattern matching uses regex to identify timestamp-based speaker entries in the transcript format
The function filters out generic speaker labels (e.g., 'Speaker 1', 'Speaker 2') and meeting room systems automatically

Similar Components

AI-powered semantic similarity - components with related functionality:

function test_attendee_extraction 87.3% similar

A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.
From: /tf/active/vicechatdev/leexi/test_attendee_extraction.py
function test_mixed_previous_reports 64.6% similar

A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.
From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
function test_session_detection 59.7% similar

A comprehensive test function that validates session detection capabilities from multiple sources including filenames, PDF files, and text patterns.
From: /tf/active/vicechatdev/e-ink-llm/test_session_detection.py
function test_llm_extraction 59.4% similar

A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.
From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
function extract_previous_reports_summary 56.9% similar

Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.
From: /tf/active/vicechatdev/leexi/app.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def test_attendee_extraction_comprehensive():
    """Test the improved attendee extraction with before/after comparison"""
    
    print("🧪 COMPREHENSIVE ATTENDEE EXTRACTION TEST")
    print("=" * 50)
    
    # Initialize generator
    generator = EnhancedMeetingMinutesGenerator(model='gpt-4o')
    
    # Test with actual transcript
    transcript_path = '/tf/active/leexi/leexi-20250618-transcript-development_team_meeting.md'
    
    try:
        # Load transcript
        with open(transcript_path, 'r', encoding='utf-8') as f:
            transcript = f.read()
        
        # Extract metadata with improved logic
        metadata = generator.parse_transcript_metadata(transcript)
        
        print(f"📄 Transcript: {transcript_path}")
        print(f"📅 Meeting Date: {metadata['date']}")
        print(f"⏱️ Duration: {metadata['duration']}")
        print(f"👥 Actual Speakers Found: {len(metadata['speakers'])}")
        
        print("\n✅ IMPROVED ATTENDEE LIST:")
        for i, speaker in enumerate(metadata['speakers'], 1):
            print(f"  {i}. {speaker}")
        
        # Show what was EXCLUDED (mentioned but not speakers)
        print("\n🚫 CORRECTLY EXCLUDED (mentioned in conversation but not actual speakers):")
        excluded_names = [
            "Jean", "Koen", "Julie", "Vincent", "Javier", "Juana", "Mike", 
            "Pascal", "Manu", "Frank", "Daniel", "Ksenia", "Wim", "Morgan"
        ]
        
        for name in excluded_names:
            if name in transcript:
                print(f"  - {name} (mentioned in conversation)")
        
        # Show analysis of speaking patterns
        print("\n📊 SPEAKING PATTERN ANALYSIS:")
        speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
        import re
        speaker_counts = {}
        
        for line in transcript.split('\n'):
            line = line.strip()
            if not line:
                continue
            match = re.match(speaker_pattern, line)
            if match:
                speaker = match.group(1).strip()
                if speaker and not re.match(r'^Speaker \d+$', speaker):
                    speaker_counts[speaker] = speaker_counts.get(speaker, 0) + 1
        
        print("  Speaker frequencies:")
        for speaker, count in sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True):
            status = "✅ INCLUDED" if speaker in metadata['speakers'] else "🚫 EXCLUDED"
            print(f"    {speaker}: {count} times - {status}")
        
        print("\n💡 IMPROVEMENT SUMMARY:")
        print("  - Only actual speakers are included as attendees")
        print("  - People mentioned in conversation are correctly excluded")
        print("  - Generic speakers (Speaker 1, Speaker 2) are filtered out")
        print("  - Meeting room systems are filtered out")
        print("  - Frequency analysis ensures consistent speakers")
        
        # Test with a small sample generation
        print("\n🎯 TESTING INTEGRATION WITH MEETING MINUTES GENERATION...")
        
        # Generate a brief summary to test attendee integration
        brief_minutes = generator.generate_meeting_minutes_with_config(
            transcript=transcript[:2000],  # Use first 2000 chars for speed
            meeting_title="Test Meeting - Attendee Extraction",
            detail_level="concise",
            rigor_level="standard",
            action_focus="standard",
            output_style="professional"
        )
        
        # Extract attendee line from generated minutes
        for line in brief_minutes.split('\n'):
            if 'Attendees:' in line:
                print(f"  Generated attendees: {line.strip()}")
                break
        
        print("\n✅ ATTENDEE EXTRACTION TEST COMPLETED SUCCESSFULLY!")
        
    except Exception as e:
        print(f"❌ Error: {e}")
        import traceback
        traceback.print_exc()
                        

Improved Code

🔍 Code Extractor

function test_attendee_extraction_comprehensive

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_attendee_extraction 87.3% similar

function test_mixed_previous_reports 64.6% similar

function test_session_detection 59.7% similar

function test_llm_extraction 59.4% similar

function extract_previous_reports_summary 56.9% similar

function test_attendee_extraction_comprehensive

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_attendee_extraction 87.3% similar

function test_mixed_previous_reports 64.6% similar

function test_session_detection 59.7% similar

function test_llm_extraction 59.4% similar

function extract_previous_reports_summary 56.9% similar

✨ Improve Code: test_attendee_extraction_comprehensive

Code Comparison