๐Ÿ” Code Extractor

function test_attendee_extraction_comprehensive

Maturity: 47

A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

File:
/tf/active/vicechatdev/leexi/test_attendee_comprehensive.py
Lines:
12 - 104
Complexity:
moderate

Purpose

This test function validates the EnhancedMeetingMinutesGenerator's ability to correctly identify actual meeting attendees (speakers) from a transcript while filtering out people merely mentioned in conversation, generic speaker labels, and meeting room systems. It provides detailed analysis of speaker patterns, frequency counts, and demonstrates the integration of attendee extraction with the full meeting minutes generation pipeline.

Source Code

def test_attendee_extraction_comprehensive():
    """Test the improved attendee extraction with before/after comparison"""
    
    print("๐Ÿงช COMPREHENSIVE ATTENDEE EXTRACTION TEST")
    print("=" * 50)
    
    # Initialize generator
    generator = EnhancedMeetingMinutesGenerator(model='gpt-4o')
    
    # Test with actual transcript
    transcript_path = '/tf/active/leexi/leexi-20250618-transcript-development_team_meeting.md'
    
    try:
        # Load transcript
        with open(transcript_path, 'r', encoding='utf-8') as f:
            transcript = f.read()
        
        # Extract metadata with improved logic
        metadata = generator.parse_transcript_metadata(transcript)
        
        print(f"๐Ÿ“„ Transcript: {transcript_path}")
        print(f"๐Ÿ“… Meeting Date: {metadata['date']}")
        print(f"โฑ๏ธ Duration: {metadata['duration']}")
        print(f"๐Ÿ‘ฅ Actual Speakers Found: {len(metadata['speakers'])}")
        
        print("\nโœ… IMPROVED ATTENDEE LIST:")
        for i, speaker in enumerate(metadata['speakers'], 1):
            print(f"  {i}. {speaker}")
        
        # Show what was EXCLUDED (mentioned but not speakers)
        print("\n๐Ÿšซ CORRECTLY EXCLUDED (mentioned in conversation but not actual speakers):")
        excluded_names = [
            "Jean", "Koen", "Julie", "Vincent", "Javier", "Juana", "Mike", 
            "Pascal", "Manu", "Frank", "Daniel", "Ksenia", "Wim", "Morgan"
        ]
        
        for name in excluded_names:
            if name in transcript:
                print(f"  - {name} (mentioned in conversation)")
        
        # Show analysis of speaking patterns
        print("\n๐Ÿ“Š SPEAKING PATTERN ANALYSIS:")
        speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
        import re
        speaker_counts = {}
        
        for line in transcript.split('\n'):
            line = line.strip()
            if not line:
                continue
            match = re.match(speaker_pattern, line)
            if match:
                speaker = match.group(1).strip()
                if speaker and not re.match(r'^Speaker \d+$', speaker):
                    speaker_counts[speaker] = speaker_counts.get(speaker, 0) + 1
        
        print("  Speaker frequencies:")
        for speaker, count in sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True):
            status = "โœ… INCLUDED" if speaker in metadata['speakers'] else "๐Ÿšซ EXCLUDED"
            print(f"    {speaker}: {count} times - {status}")
        
        print("\n๐Ÿ’ก IMPROVEMENT SUMMARY:")
        print("  - Only actual speakers are included as attendees")
        print("  - People mentioned in conversation are correctly excluded")
        print("  - Generic speakers (Speaker 1, Speaker 2) are filtered out")
        print("  - Meeting room systems are filtered out")
        print("  - Frequency analysis ensures consistent speakers")
        
        # Test with a small sample generation
        print("\n๐ŸŽฏ TESTING INTEGRATION WITH MEETING MINUTES GENERATION...")
        
        # Generate a brief summary to test attendee integration
        brief_minutes = generator.generate_meeting_minutes_with_config(
            transcript=transcript[:2000],  # Use first 2000 chars for speed
            meeting_title="Test Meeting - Attendee Extraction",
            detail_level="concise",
            rigor_level="standard",
            action_focus="standard",
            output_style="professional"
        )
        
        # Extract attendee line from generated minutes
        for line in brief_minutes.split('\n'):
            if 'Attendees:' in line:
                print(f"  Generated attendees: {line.strip()}")
                break
        
        print("\nโœ… ATTENDEE EXTRACTION TEST COMPLETED SUCCESSFULLY!")
        
    except Exception as e:
        print(f"โŒ Error: {e}")
        import traceback
        traceback.print_exc()

Return Value

This function does not return any value (implicitly returns None). It outputs comprehensive test results to stdout, including extracted attendees, excluded names, speaking pattern analysis, and integration test results. The function may raise exceptions if file reading or processing fails.

Dependencies

  • enhanced_meeting_minutes_generator
  • sys
  • json
  • re
  • traceback

Required Imports

import sys
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import json
import re
import traceback

Usage Example

# Ensure the transcript file exists and API key is set
import os
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

# Import and run the test
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import sys
import json
import re
import traceback

# Run the comprehensive test
test_attendee_extraction_comprehensive()

# Expected output includes:
# - Number of speakers found
# - List of included attendees
# - List of correctly excluded names
# - Speaking pattern analysis with frequencies
# - Integration test with generated meeting minutes

Best Practices

  • Ensure the transcript file path is accessible before running the test
  • Set up proper API credentials for the EnhancedMeetingMinutesGenerator before execution
  • The test uses only the first 2000 characters of the transcript for the integration test to optimize speed
  • Review the console output carefully to understand the distinction between actual speakers and mentioned names
  • The function includes comprehensive error handling with traceback printing for debugging
  • The test validates both the extraction logic and its integration with the full meeting minutes generation pipeline
  • Speaker pattern matching uses regex to identify timestamp-based speaker entries in the transcript format
  • The function filters out generic speaker labels (e.g., 'Speaker 1', 'Speaker 2') and meeting room systems automatically

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_attendee_extraction 87.3% similar

    A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

    From: /tf/active/vicechatdev/leexi/test_attendee_extraction.py
  • function test_mixed_previous_reports 64.6% similar

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.

    From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
  • function test_llm_extraction 59.4% similar

    A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
  • function extract_previous_reports_summary 56.9% similar

    Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

    From: /tf/active/vicechatdev/leexi/app.py
  • function handle_potential_truncation 56.9% similar

    Detects and handles truncated meeting minutes by comparing agenda items to discussion sections, then attempts regeneration with enhanced instructions to ensure completeness.

    From: /tf/active/vicechatdev/leexi/app.py
โ† Back to Browse