function msg_to_eml
Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.
/tf/active/vicechatdev/msg_to_eml.py
43 - 150
moderate
Purpose
This function provides a complete conversion utility for transforming proprietary Microsoft .msg email files into the universal .eml format. It handles all email components including sender/recipient information, subject, date, message body (both plain text and HTML versions), and file attachments. The function includes robust error handling, logging, and fallback mechanisms for missing or malformed data. It's particularly useful for email migration, archival, or integration with email systems that don't support .msg format.
Source Code
def msg_to_eml(msg_path, eml_path):
"""Convert a .msg file to .eml format preserving all content and features"""
try:
# Check if input file exists
if not os.path.exists(msg_path):
logger.error(f"Input file not found: {msg_path}")
return False
# Load the .msg file
logger.info(f"Opening .msg file: {msg_path}")
msg = extract_msg.Message(msg_path)
# Create a new EmailMessage object
eml = EmailMessage()
# Fill the basic headers
eml['From'] = parse_email_address(msg.sender)
eml['To'] = parse_email_address(msg.to)
if msg.cc:
eml['Cc'] = parse_email_address(msg.cc)
if hasattr(msg, 'bcc') and msg.bcc:
eml['Bcc'] = parse_email_address(msg.bcc)
eml['Subject'] = msg.subject or ''
# Add date header
if hasattr(msg, 'date') and msg.date:
try:
eml['Date'] = formatdate(msg.date.timestamp(), localtime=True)
except (AttributeError, TypeError):
# Fallback to current date if there's an issue with the date format
eml['Date'] = formatdate(localtime=True)
else:
eml['Date'] = formatdate(localtime=True)
# Add message ID and other headers if available
if hasattr(msg, 'message_id') and msg.message_id:
eml['Message-ID'] = msg.message_id
# Handle body content - prefer HTML if available
body_text = msg.body or ''
html_body = None
# Properly handle HTML body extraction
if hasattr(msg, 'htmlBody') and msg.htmlBody:
html_body = msg.htmlBody
elif hasattr(msg, 'html') and msg.html:
html_body = msg.html
if html_body:
# Include both plain text and HTML versions - FIX: Added maintype='text'
eml.set_content(body_text, subtype='plain')
eml.add_alternative(html_body, maintype='text', subtype='html')
else:
# Only plain text available
eml.set_content(body_text, subtype='plain')
# Handle attachments
logger.info(f"Processing {len(msg.attachments)} attachments")
for attachment in msg.attachments:
try:
# Get filename (prefer long name if available)
filename = None
if hasattr(attachment, 'longFilename') and attachment.longFilename:
filename = attachment.longFilename
elif hasattr(attachment, 'shortFilename') and attachment.shortFilename:
filename = attachment.shortFilename
else:
filename = 'attachment'
# Get attachment data
data = attachment.data
if not data:
logger.warning(f"Skipping empty attachment: {filename}")
continue
# Determine content type
content_type = None
if hasattr(attachment, 'mimetype') and attachment.mimetype:
content_type = attachment.mimetype
else:
# Guess MIME type from filename
content_type, _ = mimetypes.guess_type(filename)
if content_type:
maintype, subtype = content_type.split('/', 1)
else:
maintype, subtype = 'application', 'octet-stream'
# Add the attachment with explicit maintype and subtype
eml.add_attachment(data, maintype=maintype, subtype=subtype, filename=filename)
logger.info(f"Added attachment: {filename} ({maintype}/{subtype})")
except Exception as e:
logger.error(f"Error processing attachment: {str(e)}")
# Continue with next attachment even if this one fails
# Write the EML file
with open(eml_path, 'wb') as f:
f.write(eml.as_bytes())
logger.info(f"Successfully converted '{msg_path}' to '{eml_path}'")
return True
except Exception as e:
logger.error(f"Error converting {msg_path} to EML: {str(e)}")
# Print more detailed error information for debugging
logger.error(traceback.format_exc())
return False
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
msg_path |
- | - | positional_or_keyword |
eml_path |
- | - | positional_or_keyword |
Parameter Details
msg_path: String path to the input .msg file to be converted. Must be a valid file path pointing to an existing Microsoft Outlook .msg file. The function checks for file existence before processing.
eml_path: String path where the output .eml file will be saved. Should include the desired filename with .eml extension. The directory must be writable. If the file exists, it will be overwritten.
Return Value
Returns a boolean value: True if the conversion was successful and the .eml file was created, False if any error occurred during the conversion process (file not found, parsing errors, write errors, etc.). Errors are logged via the logger object.
Dependencies
extract_msgosmimetypesloggingemailtraceback
Required Imports
import extract_msg
import os
import mimetypes
import logging
import traceback
from email.message import EmailMessage
from email.utils import formatdate
Usage Example
import logging
import os
from email.message import EmailMessage
from email.utils import formatdate
import extract_msg
import mimetypes
import traceback
# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Define parse_email_address helper function
def parse_email_address(address):
"""Helper to parse email addresses"""
if not address:
return ''
return str(address)
# Convert a .msg file to .eml
msg_file = 'path/to/email.msg'
eml_file = 'path/to/output.eml'
success = msg_to_eml(msg_file, eml_file)
if success:
print(f'Successfully converted {msg_file} to {eml_file}')
else:
print('Conversion failed. Check logs for details.')
Best Practices
- Ensure the 'parse_email_address' function is defined in the same module before calling this function
- Configure a logger object before using this function to capture detailed conversion logs
- Verify that the input .msg file is not corrupted and is a valid Microsoft Outlook message file
- Ensure sufficient disk space is available for the output .eml file, especially when dealing with large attachments
- The function handles missing or malformed data gracefully with fallbacks, but review logs for warnings about skipped content
- For batch conversions, wrap this function in error handling to prevent one failed conversion from stopping the entire process
- The function preserves both plain text and HTML versions of the email body when available, with HTML as an alternative part
- Attachment MIME types are auto-detected if not provided in the .msg file, defaulting to 'application/octet-stream' for unknown types
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function msg_to_eml_alternative 91.9% similar
-
function msg_to_pdf_improved 85.4% similar
-
function msg_to_pdf 82.0% similar
-
class FileCloudEmailProcessor 67.8% similar
-
function generate_html_from_msg 67.1% similar