🔍 Code Extractor

function validate_azure_token_v1

Maturity: 54

Validates an Azure AD token by parsing the JWT id_token and extracting user information such as user ID, email, name, and preferred username.

File:
/tf/active/vicechatdev/docchat/auth/azure_auth.py
Lines:
168 - 231
Complexity:
moderate

Purpose

This function is designed to validate Azure AD authentication tokens received from OAuth flows and extract user profile information from the JWT id_token claims. It handles JWT parsing without full cryptographic validation, decodes the base64-encoded payload, and extracts standard Azure AD claims (oid, email, upn, name, preferred_username). If the id_token is missing or invalid, it returns fallback user information or None on critical errors. This is typically used after obtaining tokens from Azure AD to identify and authenticate users in web applications.

Source Code

def validate_azure_token(token_data: dict) -> dict:
    """
    Validate an Azure AD token and extract user information.
    
    Parameters:
        token_data (dict): Token data from get_token_from_code
    
    Returns:
        dict: User information extracted from token
    """
    try:
        logger.info(f"Validating Azure token")
        
        if not token_data or 'access_token' not in token_data:
            logger.error("Invalid token data - missing access_token")
            return None
            
        # Parse the ID token to extract user information
        if 'id_token' in token_data:
            # Parse the JWT without full validation
            jwt_parts = token_data['id_token'].split('.')
            if len(jwt_parts) != 3:
                logger.error("Invalid JWT format in id_token")
                return None
                
            # Decode the payload (second part)
            payload_bytes = jwt_parts[1].encode('utf-8')
            
            # Add padding if needed
            padding_needed = len(payload_bytes) % 4
            if padding_needed:
                payload_bytes += b'=' * (4 - padding_needed)
                
            try:
                decoded_bytes = base64.urlsafe_b64decode(payload_bytes)
                claims = json.loads(decoded_bytes.decode('utf-8'))
                
                # Extract user info from claims
                user_info = {
                    'user_id': claims.get('oid', claims.get('sub', '')),  # Object ID or Subject
                    'email': claims.get('email', claims.get('upn', claims.get('preferred_username', ''))),
                    'name': claims.get('name', ''),
                    'preferred_username': claims.get('preferred_username', '')
                }
                
                logger.info(f"Successfully extracted user info: {user_info['email']}")
                return user_info
                
            except Exception as decode_error:
                logger.error(f"Error decoding id_token: {decode_error}")
                return None
        
        # If no id_token, use minimal user info
        logger.warning("No id_token found, using minimal user info")
        return {
            'user_id': 'unknown',
            'email': 'unknown@example.com',
            'name': 'Unknown User',
            'preferred_username': 'unknown'
        }
    
    except Exception as e:
        logger.error(f"Error validating Azure token: {e}", exc_info=True)
        return None

Parameters

Name Type Default Kind
token_data dict - positional_or_keyword

Parameter Details

token_data: A dictionary containing Azure AD token information, expected to have 'access_token' and optionally 'id_token' keys. The 'id_token' should be a JWT (JSON Web Token) string in standard three-part format (header.payload.signature). This is typically the response from an Azure AD token endpoint or from a function like get_token_from_code.

Return Value

Type: dict

Returns a dictionary containing user information with keys: 'user_id' (Azure object ID or subject), 'email' (user's email address), 'name' (display name), and 'preferred_username' (username). Returns None if token validation fails critically (missing access_token, invalid JWT format, or decoding errors). If id_token is missing but access_token exists, returns a dictionary with placeholder values ('unknown', 'unknown@example.com', 'Unknown User').

Dependencies

  • base64
  • json
  • logging
  • msal

Required Imports

import base64
import json
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import msal

Condition: Required for Azure AD authentication context, though not directly used in this function it's part of the module dependencies

Required (conditional)

Usage Example

import base64
import json
import logging

# Setup logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
logger.addHandler(handler)

# Example token data from Azure AD
token_data = {
    'access_token': 'eyJ0eXAiOiJKV1QiLCJhbGc...',
    'id_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik1yNS1BVWliZkJpaTdOZDFqQmViYXhib1hXMCJ9.eyJvaWQiOiIxMjM0NTY3OC0xMjM0LTEyMzQtMTIzNC0xMjM0NTY3ODkwYWIiLCJlbWFpbCI6InVzZXJAZXhhbXBsZS5jb20iLCJuYW1lIjoiSm9obiBEb2UiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJqb2huLmRvZUBleGFtcGxlLmNvbSJ9.signature'
}

# Validate token and extract user info
user_info = validate_azure_token(token_data)

if user_info:
    print(f"User ID: {user_info['user_id']}")
    print(f"Email: {user_info['email']}")
    print(f"Name: {user_info['name']}")
    print(f"Username: {user_info['preferred_username']}")
else:
    print("Token validation failed")

Best Practices

  • This function performs JWT parsing without cryptographic signature validation, which is acceptable for extracting claims but should not be used as the sole security mechanism
  • Always ensure the token_data comes from a trusted source (e.g., directly from Azure AD token endpoint)
  • The function returns None on critical errors, so always check the return value before using user_info
  • A logger instance must be configured in the module scope before calling this function
  • The function prioritizes 'oid' over 'sub' for user_id and 'email' over 'upn' over 'preferred_username' for email extraction, following Azure AD claim hierarchy
  • Consider implementing full JWT validation with signature verification for production security-critical applications
  • Handle the fallback case where minimal user info is returned when id_token is missing

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function validate_azure_token 96.0% similar

    Validates an Azure AD token by decoding the JWT id_token and extracting user information such as email, name, and object ID.

    From: /tf/active/vicechatdev/CDocs/auth/azure_auth.py
  • function test_azure_token 60.4% similar

    Tests Azure AD authentication by attempting to acquire an OAuth2 access token using client credentials flow for Microsoft Graph API access.

    From: /tf/active/vicechatdev/SPFCsync/diagnose_sharepoint.py
  • function validate_azure_client_secret 55.9% similar

    Validates an Azure client secret by checking for placeholder values, minimum length requirements, and common invalid patterns.

    From: /tf/active/vicechatdev/SPFCsync/validate_config.py
  • function auth_callback_v2 54.2% similar

    Flask route handler that processes OAuth 2.0 callback from Azure AD, exchanges authorization code for access tokens, and establishes user session.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function azure_callback 53.7% similar

    OAuth 2.0 callback endpoint for Azure AD authentication that exchanges authorization codes for access tokens and establishes user sessions.

    From: /tf/active/vicechatdev/docchat/app.py
← Back to Browse