Maildir

From Wikipedia, the free encyclopedia

Maildir is a widely-used format for storing e-mail that does not require application-level file locking to maintain message integrity as messages are added, moved and deleted. Each message is kept in a separate file with a unique name. All changes are made using atomic filesystem operations so that the filesystem handles file locking concurrency issues. A Maildir is a directory (often named Maildir) with three subdirectories named tmp, new, and cur.

Contents

[edit] Specifications

The Maildir concept is remarkable for being both simple and unmaintained, despite being of vital functionality for large numbers of users.

[edit] Maildir

Daniel J. Bernstein, the author of qmail, djbdns, and various other software wrote the original and only Maildir specification [1]. There have been no followups since by Dan and no effort to turn this into a standard. The specification was written for one particular mail suite (Bernstein's qmail) and is general enough to be implemented in many programs. Over time and many independent implementations a small number of shortcomings have been discovered as documented in this article, but Bernstein has never updated the specification.

[edit] Maildir++

Sam Varshavchik, the author of the Courier Mail Server and other software wrote an extension [2] to the Maildir format called Maildir++ to support subfolders and mail quotas. Maildir++ directories contain subdirectories with names that start with a '.' (dot) that are also Maildir++ folders. This extension is therefore a violation of the Maildir specification, which provides an exhaustive list of the possible contents of a Maildir, however it is a compatible violation and a lot of Maildir software supports Maildir++.

[edit] Problem Space Addressed by Maildir

Mail needs to be stored in these circumstances:

  • By an SMTP MTA, after receiving from a remote mail server and while it is waiting to be delivered elsewhere. The storage area used by the MTA is often called a spool.
  • By an IMAP mailstore, which serves email to mail client software (MUAs).
  • In a local user account where the user can read email using an MUA that reads the mail data directly rather than via a network protocol.
  • In other storage and processing situations, such as when filtering Spam (electronic).

RFC822 and related standards define email messages to consist of lines of text, with strict rules concerning the first lines of text. This matches the idea of a file very well. Maildir, with its one file per message design, matches precisely what can be seen by watching SMTP email transiting a network by means of protocols such as SMTP. An MTA typically processes batches of email in a sequential access manner, so again message-per-file is a good match.

A directory containing many files each containing one message is not sufficient on its own for a mailstore or other circumstance requiring random access to email. Many implementors use a database because it is designed for indexing and searching. In 2007, filesystems usually give much better access times than databases, so the questions facing implementors come down to indexing methods and programming convenience versus speed, efficiency, reuse of existing technology and reliability. The Cyrus IMAP server, the MH Message Handling System, the Dovecot IMAP server and the UW IMAP server all have private, mutually incompatible file-per-message storage formats. (Dovecot and UW IMAP also implement formats that can be accessed by other software.)

[edit] Technical Operation

The process that delivers an e-mail message writes it to a file in the tmp directory with a unique filename. The current algorithm for generating the unique filename combines the time, the host name, and a number of pseudo-random parameters to ensure uniqueness.[1]

The delivery process stores the message in the maildir by creating and writing to tmp/unique, and then hard-linking this file to new/unique. Finally, the delivery program unlinks the file in tmp - although not formally required by the specification to do so. This sequence guarantees that a maildir-reading program will not see a partially-written message, as MUAs never look in tmp.

When the mail user agent process finds messages in the new directory it moves them to cur (using the same link then unlink strategy) and appends an informational suffix to the filename before reading them. The information suffix consists of a colon (to separate the unique part of the filename from the actual information), a '2', a comma and various flags. The '2' specifies, loosely speaking, the version of the information that follows the comma. '2' is the only currently officially specified version, '1' being an experimental version. One can only assume that it was used while the Maildir format was under development.

[edit] Issues with lockless operation

Daniel J. Bernstein designed Maildir to be safely writable by multiple concurrent writers without any form of locking, even over NFS. To a large extent this works pretty well, but he didn't take into account the real world limitations of today's filesystems. The problem is that readdir() system call doesn't guarantee that it returns all the files in a directory if the directory is being modified at the same time. In practice this means that if one process updates a message's flag, another process's readdir() call might skip a file, which causes the process to believe the message was deleted. When the process gets around to listing the messages again, the "deleted" message suddenly appears again. Some mail-accessing programs layer their own locking on top of Maildir in an attempt to prevent these kind of problems. Dovecot, for example, uses its own non-standard locking with Maildir.

[edit] Dovecot criticisms

The Dovecot (software) project, which implements an IMAP/POP3 server with built-in Maildir support has put forward some additional Issues with the specification. This critique portrays the Maildir delivery protocol as involving the following four steps (annotations from the Dovecot critique removed):

  1. Create a unique filename.
  2. Do stat(tmp/<filename>). If the stat() found a file, wait 2 seconds and go back to step 1.
  3. Create and write the message to the tmp/<filename>.
  4. link() it into new/ directory.

The critique then claims that:

Only the first step is what really guarantees that the mails won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in a while, they give no guaranteed protection ...

for two technical reasons:

  1. Step 2 is pointless because there's a race condition between steps 2 and 3. PID/host combination by itself should already guarantee that it never finds such a file.
  2. In step 4 the link() will succeed at writing a file already delivered to the maildir, since a mail reader might have already moved the original copy to the cur/ directory.

This analysis misconstrues the purpose of step 2. There would indeed be a race condition if step 2 was intended to mitigate a situation where a badly behaved operating environment created concurrent processes with the same PID which could race against each other. The analysis is correct in asserting that step 1 consitutes the primary uniqueness guarantee on a continuously operating host with a monotonic system clock, a condition which it fails to note. In the situation where the maildir host is rebooted, it is not impossible that due to system clock recalibration or a misconfigured system clock that a unique filename that exists within the maildir from a previous uptime interval is generated again. Step 2 ensures that under this condition (rare on a stable mail system) that a new mail item will not clobber an existing mail item.

The criticism about link() possibly succeeding on a duplicated filename where the duplicate has already been moved to cur/ has a certain validity. In the case of a non-monotonic system clock, it is possible for the Maildir delivery protocol, as specified by D.J. Bernstein, to inject the same filename more than once into (different areas) within a single Maildir directory tree. However, negative system clock skews are a rare event on a stable mail system, and the randomization of the PID further stacks the deck against this eventuality.

See the article on the Network Time Protocol for further information on system clock synchronization and issues which can lead to a misbehaved system clock.

[edit] Software that supports Maildir Directly

[edit] Mail servers

[edit] Delivery agents

[edit] Mail readers

[edit] Mail index and search tools

  • mairix builds an incremental database for Maildirs
  • Beagle (software) can index Maildirs and many other information storage formats

[edit] Software That Support Maildir by Implication

The list of software that can be used with Maildir is in fact much larger if you consider how this software can be plugged together, and the role of network access protocols.

For example:

  • The Sendmail MTA does not support any mail delivery format (although many assume that it does). Sendmail uses a separate delivery process called mail.local . Procmail (and other programs that support Maildir) can be used in place of mail.local, so Sendmail can rightly be said to support Maildir as much as it supports any other format.
  • Many mail readers do not support Maildir but do support remote access formats such as IMAP. Since there are several IMAP mail stores that support Maildir, any mail reader that supports IMAP such as Microsoft Outlook or Pine can be used to access Maildir folders.
  • Fetchmail does not support Maildir (or any local delivery format) but since it talks to an SMTP server any of those listed above can be used to deliver mail from Fetchmail to Maildirs.

[edit] Windows Software

The Maildir standard cannot be implemented without modification on systems running Microsoft Windows, which does not tolerate colons in filenames. There is no technical reason why software on Windows cannot use an alternative (such as ";", or "-") however lacking any way of updating the specification there has been no agreement on what character this should be. One Windows program may write Maildir files that are unreadable to another Windows program. There are programs that support Maildir written in languages such as Python (programming language) and Perl, or which have been ported from Unix using Cygwin or other systems that could function reliably together if this issue was addressed.

[edit] Notes and References

  1. ^ Daniel J. Bernstein. (1995) Using maildir format, which specifies the Maildir format and how programmers should use it.
  2. ^ Varshavchik, Sam (1998) Maildir++ and Maildir quotas which has the Maildir++ specification buried within it

[edit] See also

[edit] External links

In other languages