Notes on email transport

DISCLAIMER: This site is a mirror of original one that was once available at http://iki.fi/~tuomov/b/

1. The usual mailbox formats are crap. The mbox format is cumbersome and dangerous with multiple simultaneous accesses, and operations such as delete can be very slow. The Maildir format avoids some of the pitfalls that the mbox format has, but it is dog-slow. It can take forever to read in a mailbox with even a few hundred messages. There needs to be a better mailbox format, not proprietary to a single user-agent, but one that all programs can use, and one that the mail delivery agent also uses.

2. Infact, the mail delivery agent should simply save all mail it receives in a single (user-specific) mail database, and only after the mail is safely there, do anything with it. I hate procmail sorting kludges. I hate procmail. I've lost mail thanks to it deciding to dump everything into /dev/null because bogofilter died, and provided no output. Only after the email is safely in the database, spam taggers can apply spam tags to it. Old spam can be deleted by a cronjob, to keep the space demands low. Alternatively, the mail delivery agent should only expect new tags/headers from the taggers it runs, instead of completely filtered output, and unless one of the tags it gets, instructs to delete or forward the mail, it should put the email along with the new tags in the database. To conserve space, it should be possible to save only important headers of mail that is highly likely to be spam, but even spam shouldn't be completely deleted. There must be no point of failure, that could cause any mail to go completely into /dev/null, unless very explicitly so instructed.

3. Indeed, all mail should go and remain in a single, proper, database, instead of being sorted into separate mailboxes. The MUA can look up mailing list mail by addresses in the headers, or by tags that have been applied to the emails. Currently, it sucks having to configure both procmail and mutt to be aware of mailing lists. It should be necessary to configure mailing lists and other sorting in not both, but only either the delivery agent or user agent(s). The latter also include programs like from and the shell's notifier, which in my usage should ignore mailing list posts.

4. Actually, I very much prefer reading mailing lists through nntp://news.gmane.org/. That way I don't have to subscribe until I have something to say. But when I finally have something to say, subscription is still too cumbersome, but thanks to the spam problem, necessary. I just wish mailing list software could send back a subscription confirmation request (with delivery disabled, because obviously you don't want it, if you post unsubscribed) or some other test (like gmane itself does, when you try to post for the first time on a list through it) if you attempt to post unsubscribed. If this request is responded to within a suitable timeframe, it should deliver the quarantined post.

5. Maybe, to make subscription easier, and to not depend on something like gmane, there should be something RSS-like for mailing lists. RSS is a bit too ad hoc and inefficient for mailing lists and sites with high volume of posts, it having to be polled frequently to not lose any posts, but it should be possible to “chain” multiple files to make less frequent polls not lose anything, and for higher-frequency polls to not clog the network. Perhaps there should be separate feeds for hourly and daily polls, and the latter digest feeds should instruct when it is the best time to poll.

6. Did I mention, that I hate procmail? It is too error-prone, as well as cryptic. No, none of the alternatives seem any better, and I don't really have the choice, as I rather not receive mail on a system I have to administer. Besides, I rather not connect to my own box from any random place, although to use GPG that is necessary (supposing anyone used it). Indeed, it should be easier to synchronise mailboxes between systems. There's OfflineIMAP, but it is a kludge that demands a centralised IMAP mailbox to which everything is synchronised. I want to have an archival mailbox on my home computer, that is not symmetrically synchronised with the receiving and other mailboxes. If an email is deleted in the other mailboxes, it is still not deleted from the archival mailbox while when an email is tagged as having been “done with” (not the same as read, but more like moved from $MAIL to somewhere else) in any mailbox, it gets deleted from the other mailboxes, except the archival mailbox. And so on. It must not be necessary to be able to connect to this archival mailbox from the internet.

7. Besides procmail being too untrustworthy, thanks to spam and crappy spam filters, you can these days trust email to be delivered and read even less than snail mail. Besides improving spam filters, there has to be some way to get ‘receipts’ of mail properly delivered and not tagged as spam. The problem is, the spammers could abuse this. Perhaps that could be mitigated by limiting the number of receipts sent, and when they're sent. Another improvement, limiting bouncing abuse, would be to give this receipt during the same SMTP connection that is used to transport the mail, but this demands integration of the mail delivery agent (that understands spam taggers) with the mail transport agent.

8. GnuPG/PGP webs of trust should also be helpful in fighting spam. The web of trust would form an automatic white list, up to some score. It should also be safer to send receipts for mail with a trusted signature. If only more people used signing and encryption, and the key signing conventions weren't so centred on IRL identification, for it to be easier to obtain signatures from the people that you actually communicate with by email.

9. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) challenges are often criticised for being too cumbersome. I don't find this so, properly implemented. Certainly, if you have to reply to a CAPTCHA for every email you send, it is cumbersome – and that is the point – but people don't normally communicate with that many people by email, and a decent CAPTCHA implementation should put you on a whitelist after the first succesfull reply and not bug you anymore, if the receiver doesn't specifically instruct otherwise. Obviously, mailing lists also need special handling. Combined with webs of trust and widespread signing and encryption, the occasions for having to reply to a CAPTCHA test should further diminish. I'd much rather use CAPTCHA than crappy spam filters, if it only were more accepted and known, and perhaps integrated into the mail transport protocols, so that the CAPTCHA tests themselves wouldn't flood the internet. Of course, there's the possibility that CAPTCHA isn't efficient enough. I don't think CAPTCHA sweatshops are much of a threat – to keep up the current levels of spam, you'd need a significant fraction of mankind solely replying to the challenges – but it may be difficult to come up with enough diverse tests that any human recipient can still answer, but not computers. Another problem are all those rooted Windows boxes… and Linux and other *nix boxes, once the masses find them with the crappy and insecure user agents they've been taught to use.

10. As for the micropayment schemes envisioned by some corporations to fight spam: no fucking way. These schemes shoot a fly with a cannon – in order to corner the insect repellent market.

Article: