function generate_simple_html_from_eml
Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.
/tf/active/vicechatdev/msg_to_eml.py
994 - 1148
moderate
Purpose
This function generates a comprehensive HTML view of an email message, preserving formatting, embedding inline images as base64 data URIs, and listing attachments. It handles both multipart and simple email structures, preferring HTML content when available but gracefully falling back to plain text. The output includes email headers (From, To, Cc, Date), subject, body content with inline images, and a list of attachments. This is useful for displaying emails in web interfaces, generating email previews, or creating standalone HTML archives of email messages.
Source Code
def generate_simple_html_from_eml(msg):
"""Generate cleaner, more reliable HTML from an email.message.Message object, including inline images."""
import html
import base64
import re
html_parts = []
# Start with a clean, simple HTML template
html_parts.append("""
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style>
body {
font-family: Arial, sans-serif;
line-height: 1.5;
margin: 20px;
color: #333;
}
.header {
margin-bottom: 20px;
padding-bottom: 10px;
border-bottom: 1px solid #ddd;
}
.header h2 {
margin: 0 0 10px 0;
color: #444;
}
.meta {
margin: 10px 0;
font-size: 14px;
}
.meta div { margin: 5px 0; }
.meta strong { color: #333; }
.body { padding: 10px 0; }
.attachments {
margin-top: 15px;
padding-top: 10px;
border-top: 1px solid #eee;
}
.attachment {
background-color: #f5f5f5;
padding: 8px;
margin-bottom: 5px;
border-left: 3px solid #ddd;
}
</style>
</head>
<body>
""")
# Add header with subject
subject = msg.get('Subject', '(No Subject)')
html_parts.append(f'<div class="header"><h2>{html.escape(subject)}</h2>')
# Add metadata
html_parts.append('<div class="meta">')
for header in ['From', 'To', 'Cc', 'Date']:
if msg.get(header):
html_parts.append(f'<div><strong>{header}:</strong> {html.escape(msg.get(header, ""))}</div>')
html_parts.append('</div></div>')
# Add body content
html_parts.append('<div class="body">')
# Find the best part to display (prefer HTML, then text)
body_html = None
body_text = None
cid_images = {}
# Extract inline images (Content-ID based)
if msg.is_multipart():
for part in msg.walk():
content_type = part.get_content_type()
disposition = part.get('Content-Disposition', '')
# Handle inline images
if content_type.startswith('image') and 'inline' in disposition:
cid = part.get('Content-ID', '').strip('<>')
img_data = part.get_payload(decode=True)
if img_data:
img_type = content_type.split('/', 1)[1]
img_b64 = base64.b64encode(img_data).decode('ascii')
cid_images[f'cid:{cid}'] = f'data:image/{img_type};base64,{img_b64}'
# Skip attachments
if 'attachment' in disposition:
continue
# Get the payload
payload = part.get_payload(decode=True)
if not payload:
continue
charset = part.get_content_charset() or 'utf-8'
try:
decoded_payload = payload.decode(charset, errors='replace')
except:
decoded_payload = payload.decode('utf-8', errors='replace')
if content_type == 'text/html':
body_html = decoded_payload
break
elif content_type == 'text/plain' and not body_text:
body_text = decoded_payload
else:
# Not multipart, just get the payload
payload = msg.get_payload(decode=True)
if payload:
charset = msg.get_content_charset() or 'utf-8'
try:
decoded_payload = payload.decode(charset, errors='replace')
except:
decoded_payload = payload.decode('utf-8', errors='replace')
if msg.get_content_type() == 'text/html':
body_html = decoded_payload
else:
body_text = decoded_payload
# Use HTML content if available, otherwise convert plain text to HTML
if body_html:
# Replace inline image references with embedded base64 data
for cid_url, data_url in cid_images.items():
body_html = body_html.replace(f'src="{cid_url}"', f'src="{data_url}"')
html_parts.append(body_html)
elif body_text:
# Convert plain text to HTML with proper escaping
html_body = html.escape(body_text).replace('\n', '<br>\n')
html_parts.append(f'<pre style="white-space: pre-wrap; font-family: inherit;">{html_body}</pre>')
else:
html_parts.append('<p>(No content)</p>')
html_parts.append('</div>')
# Add attachment list
attachments = []
if msg.is_multipart():
for part in msg.walk():
if part.get_content_disposition() == 'attachment':
filename = part.get_filename()
if filename:
attachments.append(filename)
if attachments:
html_parts.append(f'<div class="attachments"><h3>Attachments ({len(attachments)})</h3>')
for attachment in attachments:
html_parts.append(f'<div class="attachment">{html.escape(attachment)}</div>')
html_parts.append('</div>')
html_parts.append('</body></html>')
return "\n".join(html_parts)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
msg |
- | - | positional_or_keyword |
Parameter Details
msg: An email.message.Message object (or compatible EmailMessage object) representing a parsed email. This should be obtained from parsing an .eml file or MIME message using Python's email module. The object should contain email headers, body content, and potentially multipart structures with attachments and inline images.
Return Value
Returns a string containing a complete, self-contained HTML document. The HTML includes embedded CSS styling, email metadata (subject, from, to, cc, date), the email body (either as HTML or converted plain text), inline images embedded as base64 data URIs, and a list of attachment filenames. The HTML is ready to be saved to a file or displayed in a web browser without external dependencies.
Dependencies
htmlbase64re
Required Imports
import html
import base64
import re
import email
Usage Example
import email
import html
import base64
import re
# Parse an email from a file
with open('example.eml', 'r', encoding='utf-8') as f:
msg = email.message_from_file(f)
# Generate HTML representation
html_output = generate_simple_html_from_eml(msg)
# Save to file
with open('email_output.html', 'w', encoding='utf-8') as f:
f.write(html_output)
# Or parse from string
eml_string = '''From: sender@example.com
To: recipient@example.com
Subject: Test Email
Content-Type: text/plain
This is a test email.'''
msg = email.message_from_string(eml_string)
html_output = generate_simple_html_from_eml(msg)
print(html_output)
Best Practices
- Ensure the input msg parameter is a properly parsed email.message.Message object using email.message_from_file() or email.message_from_string()
- The function handles character encoding errors gracefully with 'replace' mode, but ensure source emails are properly encoded when possible
- Inline images are embedded as base64 data URIs, which can significantly increase HTML file size for emails with many or large images
- The function only lists attachment filenames, it does not embed or extract attachment content
- HTML content from emails is inserted directly without sanitization - be cautious when displaying user-generated email content in security-sensitive contexts
- The function prefers HTML content over plain text when both are available in multipart emails
- Content-ID (cid:) references in HTML emails are automatically replaced with embedded base64 data URIs for inline images
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function generate_html_from_msg 82.0% similar
-
function msg_to_eml 65.4% similar
-
function msg_to_eml_alternative 65.0% similar
-
function eml_to_pdf 63.3% similar
-
function html_to_pdf 58.0% similar