🔍 Code Extractor

function test_language_detection_and_translation

Maturity: 44

A test function that validates multi-language query processing capabilities including language detection, translation, and query expansion across multiple supported languages.

File:
/tf/active/vicechatdev/docchat/test_multilanguage.py
Lines:
17 - 76
Complexity:
moderate

Purpose

This function serves as a comprehensive integration test for a RAG (Retrieval-Augmented Generation) system's multi-language capabilities. It tests the system's ability to detect the language of input queries, translate them across supported languages, and generate extended query variations. The function processes test queries in English, Dutch, and French to verify the language detection accuracy and translation quality, while also testing query expansion functionality.

Source Code

def test_language_detection_and_translation():
    """Test language detection and translation"""
    logger.info("=" * 80)
    logger.info("Testing Multi-Language Query Expansion")
    logger.info("=" * 80)
    
    # Initialize RAG engine
    try:
        rag = DocChatRAG()
        logger.info(f"✓ RAG engine initialized")
        logger.info(f"Supported languages: {config.SUPPORTED_LANGUAGES}")
        logger.info(f"Language names: {[config.LANGUAGE_NAMES.get(l, l) for l in config.SUPPORTED_LANGUAGES]}")
    except Exception as e:
        logger.error(f"✗ Failed to initialize RAG engine: {e}")
        return False
    
    # Test queries in different languages
    test_queries = [
        ("What are the safety procedures for handling chemicals?", "en"),
        ("Wat zijn de veiligheidsprocedures voor het hanteren van chemicaliën?", "nl"),
        ("Quelles sont les procédures de sécurité pour manipuler des produits chimiques?", "fr"),
        ("How do I dispose of waste?", "en"),
    ]
    
    for query, expected_lang in test_queries:
        logger.info("")
        logger.info("-" * 80)
        logger.info(f"Testing query: {query}")
        logger.info(f"Expected language: {expected_lang}")
        logger.info("-" * 80)
        
        try:
            # Test language detection and translation
            result = rag._detect_and_translate_query(query)
            
            logger.info(f"✓ Detected language: {result['detected_language']}")
            logger.info(f"✓ Translations:")
            for lang, trans in result['translations'].items():
                lang_name = config.LANGUAGE_NAMES.get(lang, lang)
                logger.info(f"  - {lang_name} ({lang}): {trans}")
            
            # Test query expansion
            extended = rag._extend_query(query)
            logger.info(f"✓ Generated {len(extended)} extended queries:")
            for i, ext_query in enumerate(extended[:10], 1):  # Show first 10
                logger.info(f"  {i}. {ext_query}")
            if len(extended) > 10:
                logger.info(f"  ... and {len(extended) - 10} more")
                
        except Exception as e:
            logger.error(f"✗ Error processing query: {e}")
            import traceback
            traceback.print_exc()
    
    logger.info("")
    logger.info("=" * 80)
    logger.info("Multi-Language Test Complete")
    logger.info("=" * 80)
    
    return True

Return Value

Returns a boolean value: True if the test completes successfully (even if individual queries fail), False if the RAG engine fails to initialize. The function primarily logs test results rather than returning detailed information.

Dependencies

  • logging
  • sys
  • traceback

Required Imports

import logging
import sys
import config
from rag_engine import DocChatRAG
import traceback

Conditional/Optional Imports

These imports are only needed under specific conditions:

import traceback

Condition: only used when an exception occurs during query processing to print detailed error information

Optional

Usage Example

# Ensure logger is configured
import logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Ensure config module has required settings
import config
config.SUPPORTED_LANGUAGES = ['en', 'nl', 'fr']
config.LANGUAGE_NAMES = {'en': 'English', 'nl': 'Dutch', 'fr': 'French'}

# Import and run the test
from rag_engine import DocChatRAG

# Run the test
success = test_language_detection_and_translation()
if success:
    print("Test completed successfully")
else:
    print("Test failed during initialization")

Best Practices

  • Ensure the logger is properly configured before calling this function as it relies on module-level 'logger' variable
  • The config module must define SUPPORTED_LANGUAGES and LANGUAGE_NAMES before running this test
  • The DocChatRAG class must implement _detect_and_translate_query() and _extend_query() methods
  • This is a test function and should not be used in production code - it's designed for validation and debugging
  • The function catches and logs exceptions but continues execution, making it suitable for comprehensive testing scenarios
  • Consider running this test in a controlled environment as it may make external API calls for translation services
  • The test queries are hardcoded - modify them to test specific use cases or languages relevant to your application

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_config 59.5% similar

    A test function that validates the presence and correctness of all required configuration settings for a multi-model RAG (Retrieval-Augmented Generation) system.

    From: /tf/active/vicechatdev/docchat/test_model_selection.py
  • function test_rag_engine 54.9% similar

    A test function that validates the RAG engine's ability to correctly instantiate different LLM models (OpenAI, Anthropic, Gemini) based on configuration settings.

    From: /tf/active/vicechatdev/docchat/test_model_selection.py
  • function test_single_vendor 54.2% similar

    Tests vendor enrichment by querying a RAG (Retrieval-Augmented Generation) system to find official contact information (email and VAT number) for a specified vendor using document search and web search capabilities.

    From: /tf/active/vicechatdev/find_email/test_enrichment.py
  • function test_adjusted_topk 52.6% similar

    A test function that validates the adjusted top_k calculation by testing multiple base values against the number of supported languages and logging the results.

    From: /tf/active/vicechatdev/docchat/test_multilanguage.py
  • function test_international_tax_ids 50.5% similar

    A test function that validates an LLM client's ability to extract tax identification numbers and business registration numbers from a multi-party international contract document across 8 different countries.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_international_tax_ids.py
← Back to Browse