emlx Files

For the moment, I’m using Apple‘s Mail as my primary email client (even though it’s bafflingly slow at displaying messages at times (they’re simple text files!), gets stuck updating at times, and won’t let me tell it that ihug‘s SSL certificate is ok (which is partly ihug’s fault for buying some cheap one instead of something that programs would recognise) it does have some nice features, and beats any of the other mail clients I’ve tried).

As of Tiger, Mail stores messages in individual emlx files, scattered through various folders in the ~/Library/Mail folder.  For use with SpamBayes‘ test setup (as well as others, like the TREC one), I need messages in individual files in plain RFC2822 format.

What I needed was a simple export script (much like the existing Outlook export script – except hopefully faster and including attachments) that would create RFC2822 copies of the emlx files in the standard SpamBayes format (ham and spam directories containing a reservoir directory containing messages as individual text files).

I had thought that this might be quite difficult (take a look at the Outlook export script!) since emlx is a proprietory format.  Thankfully, I discovered that the first line is the size of the message in bytes (as text), followed by the RFC2822 message itself, followed by a plist containing various Mail information I’m not interested in (flags, sender, etc).  Nice to see that Apple can keep things simple.

So the SpamBayes distribution now contains a simple export_apple_mail.py script that will do the job.

technorati tags: , , , ,


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: