generate_simple_html_from_eml

function generate_simple_html_from_eml

Maturity: 45

Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.

File:
/tf/active/vicechatdev/msg_to_eml.py

Lines:
994 - 1148

Complexity:
moderate

Purpose

This function generates a comprehensive HTML view of an email message, preserving formatting, embedding inline images as base64 data URIs, and listing attachments. It handles both multipart and simple email structures, preferring HTML content when available but gracefully falling back to plain text. The output includes email headers (From, To, Cc, Date), subject, body content with inline images, and a list of attachments. This is useful for displaying emails in web interfaces, generating email previews, or creating standalone HTML archives of email messages.

Source Code

def generate_simple_html_from_eml(msg):
    """Generate cleaner, more reliable HTML from an email.message.Message object, including inline images."""
    import html
    import base64
    import re

    html_parts = []

    # Start with a clean, simple HTML template
    html_parts.append("""
    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="utf-8">
        <style>
            body { 
                font-family: Arial, sans-serif; 
                line-height: 1.5;
                margin: 20px;
                color: #333;
            }
            .header {
                margin-bottom: 20px;
                padding-bottom: 10px;
                border-bottom: 1px solid #ddd;
            }
            .header h2 { 
                margin: 0 0 10px 0;
                color: #444;
            }
            .meta {
                margin: 10px 0;
                font-size: 14px;
            }
            .meta div { margin: 5px 0; }
            .meta strong { color: #333; }
            .body { padding: 10px 0; }
            .attachments {
                margin-top: 15px;
                padding-top: 10px;
                border-top: 1px solid #eee;
            }
            .attachment {
                background-color: #f5f5f5;
                padding: 8px;
                margin-bottom: 5px;
                border-left: 3px solid #ddd;
            }
        </style>
    </head>
    <body>
    """)

    # Add header with subject
    subject = msg.get('Subject', '(No Subject)')
    html_parts.append(f'<div class="header"><h2>{html.escape(subject)}</h2>')

    # Add metadata
    html_parts.append('<div class="meta">')
    for header in ['From', 'To', 'Cc', 'Date']:
        if msg.get(header):
            html_parts.append(f'<div><strong>{header}:</strong> {html.escape(msg.get(header, ""))}</div>')
    html_parts.append('</div></div>')

    # Add body content
    html_parts.append('<div class="body">')

    # Find the best part to display (prefer HTML, then text)
    body_html = None
    body_text = None
    cid_images = {}

    # Extract inline images (Content-ID based)
    if msg.is_multipart():
        for part in msg.walk():
            content_type = part.get_content_type()
            disposition = part.get('Content-Disposition', '')

            # Handle inline images
            if content_type.startswith('image') and 'inline' in disposition:
                cid = part.get('Content-ID', '').strip('<>')
                img_data = part.get_payload(decode=True)
                if img_data:
                    img_type = content_type.split('/', 1)[1]
                    img_b64 = base64.b64encode(img_data).decode('ascii')
                    cid_images[f'cid:{cid}'] = f'data:image/{img_type};base64,{img_b64}'

            # Skip attachments
            if 'attachment' in disposition:
                continue

            # Get the payload
            payload = part.get_payload(decode=True)
            if not payload:
                continue

            charset = part.get_content_charset() or 'utf-8'
            try:
                decoded_payload = payload.decode(charset, errors='replace')
            except:
                decoded_payload = payload.decode('utf-8', errors='replace')

            if content_type == 'text/html':
                body_html = decoded_payload
                break
            elif content_type == 'text/plain' and not body_text:
                body_text = decoded_payload
    else:
        # Not multipart, just get the payload
        payload = msg.get_payload(decode=True)
        if payload:
            charset = msg.get_content_charset() or 'utf-8'
            try:
                decoded_payload = payload.decode(charset, errors='replace')
            except:
                decoded_payload = payload.decode('utf-8', errors='replace')

            if msg.get_content_type() == 'text/html':
                body_html = decoded_payload
            else:
                body_text = decoded_payload

    # Use HTML content if available, otherwise convert plain text to HTML
    if body_html:
        # Replace inline image references with embedded base64 data
        for cid_url, data_url in cid_images.items():
            body_html = body_html.replace(f'src="{cid_url}"', f'src="{data_url}"')

        html_parts.append(body_html)
    elif body_text:
        # Convert plain text to HTML with proper escaping
        html_body = html.escape(body_text).replace('\n', '<br>\n')
        html_parts.append(f'<pre style="white-space: pre-wrap; font-family: inherit;">{html_body}</pre>')
    else:
        html_parts.append('<p>(No content)</p>')

    html_parts.append('</div>')

    # Add attachment list
    attachments = []
    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_disposition() == 'attachment':
                filename = part.get_filename()
                if filename:
                    attachments.append(filename)

    if attachments:
        html_parts.append(f'<div class="attachments"><h3>Attachments ({len(attachments)})</h3>')
        for attachment in attachments:
            html_parts.append(f'<div class="attachment">{html.escape(attachment)}</div>')
        html_parts.append('</div>')

    html_parts.append('</body></html>')
    return "\n".join(html_parts)

Parameters

Name	Type	Default	Kind
`msg`	-	-	positional_or_keyword

Parameter Details

msg: An email.message.Message object (or compatible EmailMessage object) representing a parsed email. This should be obtained from parsing an .eml file or MIME message using Python's email module. The object should contain email headers, body content, and potentially multipart structures with attachments and inline images.

Return Value

Returns a string containing a complete, self-contained HTML document. The HTML includes embedded CSS styling, email metadata (subject, from, to, cc, date), the email body (either as HTML or converted plain text), inline images embedded as base64 data URIs, and a list of attachment filenames. The HTML is ready to be saved to a file or displayed in a web browser without external dependencies.

Dependencies

html
base64
re

Required Imports

import html
import base64
import re
import email

Usage Example

import email
import html
import base64
import re

# Parse an email from a file
with open('example.eml', 'r', encoding='utf-8') as f:
    msg = email.message_from_file(f)

# Generate HTML representation
html_output = generate_simple_html_from_eml(msg)

# Save to file
with open('email_output.html', 'w', encoding='utf-8') as f:
    f.write(html_output)

# Or parse from string
eml_string = '''From: sender@example.com
To: recipient@example.com
Subject: Test Email
Content-Type: text/plain

This is a test email.'''
msg = email.message_from_string(eml_string)
html_output = generate_simple_html_from_eml(msg)
print(html_output)

Best Practices

Ensure the input msg parameter is a properly parsed email.message.Message object using email.message_from_file() or email.message_from_string()
The function handles character encoding errors gracefully with 'replace' mode, but ensure source emails are properly encoded when possible
Inline images are embedded as base64 data URIs, which can significantly increase HTML file size for emails with many or large images
The function only lists attachment filenames, it does not embed or extract attachment content
HTML content from emails is inserted directly without sanitization - be cautious when displaying user-generated email content in security-sensitive contexts
The function prefers HTML content over plain text when both are available in multipart emails
Content-ID (cid:) references in HTML emails are automatically replaced with embedded base64 data URIs for inline images

Similar Components

AI-powered semantic similarity - components with related functionality:

function generate_html_from_msg 82.0% similar

Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.
From: /tf/active/vicechatdev/msg_to_eml.py
function msg_to_eml 65.4% similar

Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.
From: /tf/active/vicechatdev/msg_to_eml.py
function msg_to_eml_alternative 65.0% similar

Converts Microsoft Outlook .msg files to .eml (email) format using the extract_msg library, preserving email headers, body content (plain text and HTML), and attachments.
From: /tf/active/vicechatdev/msg_to_eml.py
function eml_to_pdf 63.3% similar

Converts an .eml email file to PDF format, including the email body and all attachments merged into a single PDF document.
From: /tf/active/vicechatdev/msg_to_eml.py
function html_to_pdf 58.0% similar

Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.
From: /tf/active/vicechatdev/msg_to_eml.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def generate_simple_html_from_eml(msg):
    """Generate cleaner, more reliable HTML from an email.message.Message object, including inline images."""
    import html
    import base64
    import re

    html_parts = []

    # Start with a clean, simple HTML template
    html_parts.append("""
    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="utf-8">
        <style>
            body { 
                font-family: Arial, sans-serif; 
                line-height: 1.5;
                margin: 20px;
                color: #333;
            }
            .header {
                margin-bottom: 20px;
                padding-bottom: 10px;
                border-bottom: 1px solid #ddd;
            }
            .header h2 { 
                margin: 0 0 10px 0;
                color: #444;
            }
            .meta {
                margin: 10px 0;
                font-size: 14px;
            }
            .meta div { margin: 5px 0; }
            .meta strong { color: #333; }
            .body { padding: 10px 0; }
            .attachments {
                margin-top: 15px;
                padding-top: 10px;
                border-top: 1px solid #eee;
            }
            .attachment {
                background-color: #f5f5f5;
                padding: 8px;
                margin-bottom: 5px;
                border-left: 3px solid #ddd;
            }
        </style>
    </head>
    <body>
    """)

    # Add header with subject
    subject = msg.get('Subject', '(No Subject)')
    html_parts.append(f'<div class="header"><h2>{html.escape(subject)}</h2>')

    # Add metadata
    html_parts.append('<div class="meta">')
    for header in ['From', 'To', 'Cc', 'Date']:
        if msg.get(header):
            html_parts.append(f'<div><strong>{header}:</strong> {html.escape(msg.get(header, ""))}</div>')
    html_parts.append('</div></div>')

    # Add body content
    html_parts.append('<div class="body">')

    # Find the best part to display (prefer HTML, then text)
    body_html = None
    body_text = None
    cid_images = {}

    # Extract inline images (Content-ID based)
    if msg.is_multipart():
        for part in msg.walk():
            content_type = part.get_content_type()
            disposition = part.get('Content-Disposition', '')

            # Handle inline images
            if content_type.startswith('image') and 'inline' in disposition:
                cid = part.get('Content-ID', '').strip('<>')
                img_data = part.get_payload(decode=True)
                if img_data:
                    img_type = content_type.split('/', 1)[1]
                    img_b64 = base64.b64encode(img_data).decode('ascii')
                    cid_images[f'cid:{cid}'] = f'data:image/{img_type};base64,{img_b64}'

            # Skip attachments
            if 'attachment' in disposition:
                continue

            # Get the payload
            payload = part.get_payload(decode=True)
            if not payload:
                continue

            charset = part.get_content_charset() or 'utf-8'
            try:
                decoded_payload = payload.decode(charset, errors='replace')
            except:
                decoded_payload = payload.decode('utf-8', errors='replace')

            if content_type == 'text/html':
                body_html = decoded_payload
                break
            elif content_type == 'text/plain' and not body_text:
                body_text = decoded_payload
    else:
        # Not multipart, just get the payload
        payload = msg.get_payload(decode=True)
        if payload:
            charset = msg.get_content_charset() or 'utf-8'
            try:
                decoded_payload = payload.decode(charset, errors='replace')
            except:
                decoded_payload = payload.decode('utf-8', errors='replace')

            if msg.get_content_type() == 'text/html':
                body_html = decoded_payload
            else:
                body_text = decoded_payload

    # Use HTML content if available, otherwise convert plain text to HTML
    if body_html:
        # Replace inline image references with embedded base64 data
        for cid_url, data_url in cid_images.items():
            body_html = body_html.replace(f'src="{cid_url}"', f'src="{data_url}"')

        html_parts.append(body_html)
    elif body_text:
        # Convert plain text to HTML with proper escaping
        html_body = html.escape(body_text).replace('\n', '<br>\n')
        html_parts.append(f'<pre style="white-space: pre-wrap; font-family: inherit;">{html_body}</pre>')
    else:
        html_parts.append('<p>(No content)</p>')

    html_parts.append('</div>')

    # Add attachment list
    attachments = []
    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_disposition() == 'attachment':
                filename = part.get_filename()
                if filename:
                    attachments.append(filename)

    if attachments:
        html_parts.append(f'<div class="attachments"><h3>Attachments ({len(attachments)})</h3>')
        for attachment in attachments:
            html_parts.append(f'<div class="attachment">{html.escape(attachment)}</div>')
        html_parts.append('</div>')

    html_parts.append('</body></html>')
    return "\n".join(html_parts)
                        

Improved Code

🔍 Code Extractor

function generate_simple_html_from_eml

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function generate_html_from_msg 82.0% similar

function msg_to_eml 65.4% similar

function msg_to_eml_alternative 65.0% similar

function eml_to_pdf 63.3% similar

function html_to_pdf 58.0% similar

function generate_simple_html_from_eml

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function generate_html_from_msg 82.0% similar

function msg_to_eml 65.4% similar

function msg_to_eml_alternative 65.0% similar

function eml_to_pdf 63.3% similar

function html_to_pdf 58.0% similar

✨ Improve Code: generate_simple_html_from_eml

Code Comparison