function msg_to_eml_alternative
Converts Microsoft Outlook .msg files to .eml (email) format using the extract_msg library, preserving email headers, body content (plain text and HTML), and attachments.
/tf/active/vicechatdev/msg_to_eml.py
152 - 259
complex
Purpose
This function provides an alternative method for converting .msg files to .eml format when standard conversion methods fail. It uses extract_msg's built-in save_email method if available, or manually constructs a MIME-compliant .eml file with proper headers, multipart boundaries, and base64-encoded attachments. This is useful for email migration, archival, or integration with systems that require .eml format.
Source Code
def msg_to_eml_alternative(msg_path, eml_path):
"""Alternative conversion approach using extract_msg's built-in functionality"""
try:
if not os.path.exists(msg_path):
logger.error(f"Input file not found: {msg_path}")
return False
# Load the .msg file
logger.info(f"Using alternative conversion method for: {msg_path}")
msg = extract_msg.Message(msg_path)
# Try direct raw EML content extraction if available
if hasattr(msg, 'save_email'):
msg.save_email(eml_path)
logger.info(f"Successfully converted '{msg_path}' to '{eml_path}' using built-in save_email")
return True
# Use extract_msg's built-in properties to manually create the EML
with open(eml_path, 'w', encoding='utf-8') as f:
# Write basic headers
f.write(f"From: {msg.sender}\n")
f.write(f"To: {msg.to}\n")
if msg.cc:
f.write(f"Cc: {msg.cc}\n")
f.write(f"Subject: {msg.subject or ''}\n")
# Add date
if hasattr(msg, 'date') and msg.date:
try:
f.write(f"Date: {msg.date}\n")
except:
f.write(f"Date: {formatdate(localtime=True)}\n")
else:
f.write(f"Date: {formatdate(localtime=True)}\n")
# Add content type header for MIME message
f.write("MIME-Version: 1.0\n")
# Create a simple multipart message
boundary = "----=_NextPart_" + os.urandom(16).hex()
f.write(f'Content-Type: multipart/mixed; boundary="{boundary}"\n\n')
# Add message separator
f.write(f"--{boundary}\n")
# Add plain text body
f.write('Content-Type: text/plain; charset="utf-8"\n')
f.write('Content-Transfer-Encoding: quoted-printable\n\n')
f.write(msg.body or '')
f.write(f"\n\n--{boundary}\n")
# Add HTML body if available
html_content = None
if hasattr(msg, 'htmlBody') and msg.htmlBody:
html_content = msg.htmlBody
elif hasattr(msg, 'html') and msg.html:
html_content = msg.html
if html_content:
f.write('Content-Type: text/html; charset="utf-8"\n')
f.write('Content-Transfer-Encoding: quoted-printable\n\n')
f.write(html_content)
f.write(f"\n\n--{boundary}\n")
# Add attachments
for attachment in msg.attachments:
try:
# Get filename
filename = getattr(attachment, 'longFilename', None) or getattr(attachment, 'shortFilename', None) or 'attachment'
# Determine content type
content_type = None
if hasattr(attachment, 'mimetype') and attachment.mimetype:
content_type = attachment.mimetype
else:
content_type, _ = mimetypes.guess_type(filename)
if not content_type:
content_type = 'application/octet-stream'
# Write attachment headers
f.write(f'Content-Type: {content_type}; name="{filename}"\n')
f.write('Content-Transfer-Encoding: base64\n')
f.write(f'Content-Disposition: attachment; filename="{filename}"\n\n')
# Write base64 encoded attachment data
import base64
if attachment.data:
encoded_data = base64.b64encode(attachment.data).decode('ascii')
# Write in chunks of 76 characters for proper base64 format
for i in range(0, len(encoded_data), 76):
f.write(encoded_data[i:i+76] + '\n')
f.write(f"\n--{boundary}\n")
except Exception as e:
logger.error(f"Error processing attachment {filename}: {str(e)}")
# Close the multipart message
f.write(f"--{boundary}--\n")
logger.info(f"Successfully converted '{msg_path}' to '{eml_path}' using manual alternative method")
return True
except Exception as e:
logger.error(f"Error in alternative conversion of {msg_path} to EML: {str(e)}")
logger.error(traceback.format_exc())
return False
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
msg_path |
- | - | positional_or_keyword |
eml_path |
- | - | positional_or_keyword |
Parameter Details
msg_path: String path to the input .msg file to be converted. Must be a valid file path pointing to an existing Microsoft Outlook .msg file.
eml_path: String path where the output .eml file will be saved. The directory must exist and be writable. If the file exists, it will be overwritten.
Return Value
Returns a boolean value: True if the conversion was successful, False if any error occurred during the conversion process (e.g., file not found, parsing errors, write errors).
Dependencies
extract_msgosmimetypesloggingemailtracebackbase64
Required Imports
import extract_msg
import os
import mimetypes
import logging
import traceback
from email.utils import formatdate
Conditional/Optional Imports
These imports are only needed under specific conditions:
import base64
Condition: only when processing attachments in the manual conversion path
Required (conditional)Usage Example
import logging
import extract_msg
import os
import mimetypes
import traceback
from email.utils import formatdate
# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Convert a .msg file to .eml
msg_file = '/path/to/email.msg'
eml_file = '/path/to/output.eml'
success = msg_to_eml_alternative(msg_file, eml_file)
if success:
print(f'Successfully converted {msg_file} to {eml_file}')
else:
print(f'Conversion failed. Check logs for details.')
Best Practices
- Ensure the logger object is properly configured before calling this function
- Verify that the input .msg file exists and is readable before calling
- Check the return value to determine if conversion was successful
- Handle the case where the output directory may not exist by creating it beforehand
- Be aware that this function writes files with UTF-8 encoding, which may cause issues with certain binary content if not properly base64-encoded
- The function attempts to use extract_msg's built-in save_email method first, falling back to manual construction if unavailable
- Large attachments will be loaded into memory during base64 encoding, so ensure sufficient memory is available
- The function uses os.urandom() to generate MIME boundaries, ensuring uniqueness across conversions
- Error handling is comprehensive but errors are logged rather than raised, so check logs for detailed error information
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function msg_to_eml 91.9% similar
-
function msg_to_pdf_improved 81.7% similar
-
function msg_to_pdf 76.1% similar
-
function generate_html_from_msg 65.5% similar
-
function generate_simple_html_from_eml 65.0% similar