Prasenjeet Dutta, Cybernet Software Systems (bulk chaoszone org)
Karthik M Narayanaswami, National Institute of Technology and Science (mnk acm org)
| (Note: This is a draft) |
Skip Introduction and go directly to the proposal |
N.B. Hadmut Danisch's suggestion about using RMX records appears to be a much better one than the 'dnsmx' suggestion in this paper. Of course, using the Origin-Server-Identity: dnsmx (or should that be rmx?) header doesn't hurt either, it in fact helps existing SMTP servers with plugins installed to process the email and mark it verified.
The rise of Unsolicited Commercial Email ("spam") has been one of the more distressing features of the online revolution over the last 15-odd years. Old, widely deployed protocols, which were designed for a largely military/academic environment are today deployed in an actively hostile network: black-hats, commercial interests that are unaware of or actively ignore netiquette, and so on.
Other old protocols, such as FTP, have been abused as well. A rash of FTP vulnerabilities have ensured vendors lock down their offerings and audit their code, and many users with confidential information to transmit choose not to use FTP anyway (except for the rare FTP-over-SSL user), opting for sftp/scp instead.
SMTP, possibly the last "old protocol" still in (very!) active use, has not fared as well. Despite the excellent work of several organizations, spam continues to be a problem on the Internet, as almost every email user will testify. So far, the bulk of the work done on spam-reduction has been filter-oriented: mainly at the MUA, but increasingly at the MTA as well.
Spam wastes bandwidth. Every time a spammer spews forth millions of email messages into the internet, bandwidth is consumed by destination servers to process these messages. Considering that some studies show that sites running SMTP have upto 40% of their bandwidth is consumed by spam, this is not an insignificant factor. Users of MUAs that filter spam also waste bandwidth by receiving spam. While this may less of a problem for broadband and zero-cost local call users, users in areas where local calls are metered and broadband users with download caps may feel a sense of chagrin at the amount of spam they receive.
Spam wastes money. Apart from the cost of bandwidth (which is not negligible), organizations typically spend money, engineering resources and hardware on adding filtering options to MTAs and MUAs. For non-technical organizations, support costs must also be factored in.
Spam impacts productivity. A user who has to sift through 50 pieces of spam a day to find a legitimate email is a user whose time could be better utilized.
Spam impacts legitimate advertisers. Legitimate ("opt-in") advertisements and newsletters have the greatest chance of being "lost in the clutter" of a spam-ravaged inbox, leading to less people actually reading legitimate messages.
Spam aids fraud. A significant percentage of spam is used to carry messages for fraudulent business practices. Law-enforcement officials have set up cells to address these problems, but while no widely deployed medium can realistically be assumed to be free of crime, clearly the technical community can do better to reduce the affinity that fraudsters have for e-mail.
Spam is a security problem. Newer viruses ("Melissa" and friends) have emerged that internally implement SMTP to send large numbers of messages, typically to all listed in the host system's address book.
The economics of spam are such that even a very low response rate is profitable for spammers. It is widely postulated that the people who tend to reply to spam tend to be gullible and credulous, as well as unfamiliar with the existence of scammers on the Internet. While I am unable to find a distinct correlation, it seems likely that such people would be the least likely to install sophisticated systems on their MUAs.
Filtering at the MTA is not the answer either, especially for MTAs that service large numbers of users. The diversity of email users ensures that what is spam to engineering, for instance, is manna for the marketing department. When an MTA's users span continents (as in the case of Hotmail), this poses an even bigger problem. MTA filters can, at best, filter out "obvious" spam, such as Multi-Level Marketing offers and Nigerian-style scams.
Also, as Paul Graham and others have noted, filters are an arms race between the spammer and the receiver -- the spam of the future are likely to be very innocuous messages followed by dubious hyperlinks. Further, some spam is merely "tracer" email that attempts to fish out email addresses to add to spammer databases -- the senders of these would not care a whit about whether they were filtered or not.
Filters also do not effectively address the problem of wasted bandwidth; filtration being an after-the-fact affair. Finally, there is a large class of users for whom false positives caused by filtering software can be very harmful (such as marketing or sales), there are even users (typically in government and abuse desks) for whom filtering is illegal or unfeasible.
The motivation behind writing this paper was my firm belief that bolting on filtering solutions was clearly not the answer: a lower-level solution (i.e., at the application protocol level) had to exist. To be acceptable to the Internet community, it was also clear that any solution proposed would have to preserve the desirable characteristics of email including ubiquity, ease-of-use, and not introduce new privacy concerns. Further, any solution would have to offer "side-by-side compatibility" to existing, widely-deployed email infrastructure, both MUAs and MTAs, and not make a radical break from SMTP servers' existing design, so that implementers do not find it difficult to migrate to the new proposal.
In this paper, we present a protocol description, built on top of SMTP, that we believe would fulfill these criteria. There is no technical new-ness in this paper, this is because spam is not a technical problem any more than littering is. We further wished to invent no new technology, instead harness existing ones to provide a solution. However, we believe the paper does present a workable, well-thought-out strategy to combat spam without entering into a filter-based arms race. Ideally, this would complement filtering MUAs to present a clean inbox to the user.
Sources of email spam include:
Sources (1) and (2) have declined over the last few years, as the Internet community has aggressively "boycotted" such ISPs and mail servers. Solutions exist today that track down known spammers' net-blocks and put them onto distributed real-time blacklists. Also, SMTP server vendors have done a good job in ensuring that their products do not out-of-box run as open relays. While there still exist commercial organizations which exploit legal loopholes, these can be dealt with through other means. In fact, faced with boycotts, many spammers are resorting to spoofing headers and forging their identity in order to ply their trade. It is this category of spammers (i.e., spoofers and forgers) that this paper proposes to primarily target.
Sources (3) and (4) present a far more knotty problem. They can be tracked via distributed blacklists, but it quickly turns into a game of whack-a-mole. Tarring entire net-blocks instead of IP addresses often results in innocent bystanders being "shot" by the blacklisting entity, causing them harm (there have been cases where affected parties have ended up in court).
Most proposed solutions to (3) and (4) run along the lines of blocking egress on port 25. This is a shockingly bad idea. In particular, it discriminates badly between people who have full control over their own networks (typical in academia and industry) and people who don't (typically home users behind a broadband connection, many of whom are technically sophisticated and responsible). At some time, the industry will have to take a call on the question: should those who use broadband in a non-business context, such as at home, and have connectivity that is good enough for streaming webcam video, and who are the next growth wave for ISPs, be treated second-class netizens when it comes to manning a mail server? Our contention is "no": saying that only an elite few can run mail servers is no more acceptable than saying that only an elite few can run packet sniffers and hex editors.
Another set of solutions deals with using sending mails signed via PGP or S/MIME. This is impractical on a large scale because many users use MUAs where PGP or S/MIME is impractical (e.g. webmail) or because they are not technically sophisticated enough to grasp the intricacies of PGP or S/MIME key management and use. As is well-known, user-friendliness and security are often counter-goals, hence we believe that "dumbing down" S/MIME for the masses will serve no particular purpose.
Clearly, the problem of "spam" can be isolated to rogue SMTP servers that insist on spewing forth spam into the Internet. Given that these spam-servers are essentially untraceable and unidentifiable (the traceable and identifiable spam-servers are subject to community boycott anyway), spam continues unabated. On the other hand, if identities for individual mail servers could be positively determined, then we would come a long way in assigning responsibility for particularly blatant acts of email abuse. It is our intention to show here a set of extension headers for SMTP to allow this to happen. Importantly, our method is transparent to the end-user in the sense that she does not have to actively do anything to block mail from unverifiable senders.
The problem is similar (but not identical) to that e-commerce providers faced with HTTP some years back, and that problem was solved by using SSL to develop the HTTPS protocol. At the heart of the trust model that underlies HTTPS is a chained certificate system that is given out (typically for a fee) by organizations like VeriSign, usually based on a credit-card check. However, in SMTP's case, many non-profits, government entities, etc use email for non-profit purposes, so it is clear that a credit-check cannot be the only basis of trust (cf. Sec. 3).
We have used language from RFC 2440's hierarchical trust model in our examples below for simplicity, but the specification is open to using other trust models, including X.509 Certificates.
A typical SMTP exchange is illustrated below. O stands for Origin Server, i.e. the server transferring the mail, and T for desTination Server, the server receiving the mail (it is not necessary that T is the final destination of the mail, T could merely be relaying it on behalf of O).
T: 220 gc.net ESMTP Service (Domino/5.0.9a) ready at Wed, 22 Jan 2003 09:13:26 -0800 O: HELO europa.cybernetsoft.com T: 250 gc.net Hello europa.cybernetsoft.com ([209.10.58.222]), pleased to meet you O: MAIL FROM: <pd@europa.cybernetsoft.com> T: 250 pd@europa.cybernetsoft.com... Sender OK O: RCPT TO: <gc.net> T: 250 gc.net... Recipient OK O: DATA T: 354 Enter message, end with "." on a line by itself O: Received: from [127.0.0.1] by europa O: (Exim version 3.12 #1); Wed, 22 Jan 2003 07:10:06 O: Message-ID: <002b01c2c1b7$30889ab0$1c01010a@europa> O: From: "PD" <pd@europa.cybernetsoft.com> O: To: "Prasenjeet Dutta" <pd@gc.net> O: Subject: Typical SMTP Exchange O: Date: Wed, 22 Jan 2003 07:10:04 +0530 O: MIME-Version: 1.0 O: Content-Type: text/plain; O: charset="iso-8859-1" O: Content-Transfer-Encoding: 7bit O: O: This is a test. O: O: (message ends next line) O: (this space intentionally left blank) O: . T: 250 Message accepted for delivery O: QUIT
We propose that the following changes. In doing so, we have not tampered with SMTP's underlying envelope/content structure, the primary rationale for this being firstly, backward-compatibility ("introduce no new commands") and secondly, giving Destination Servers the ability to "lazily" perform Origin Server Verification (of course, Destination Servers which wish to perform Origin Server Verification on-the-fly can do so as well).
The structure of the headers discussed above is given in Sec. 4.
T: 220 gc.net ESMTP Service (Domino/5.0.9a) ready at Wed, 22 Jan 2003 09:13:26 -0800 O: HELO europa.cybernetsoft.com T: 250 gc.net Hello europa.cybernetsoft.com ([209.10.58.222]), pleased to meet you O: MAIL FROM: <pd@europa.cybernetsoft.com> T: 250 pd@europa.cybernetsoft.com... Sender OK O: RCPT TO: <pd@gc.net> T: 250 pd@gc.net... Recipient OK O: DATA T: 354 Enter message, end with "." on a line by itself O: Received: from localhost by europa O: (Exim version 3.12 #1); Wed, 22 Jan 2003 07:10:06 O: Origin-Server-Identity: public-key; O: europa.cybernetsoft.com (209.10.58.222) O: Origin-Server-Key: <mailto:osk+europa@cybernetsoft.com> O: Origin-Server-Signature: rfc2440; encoding=base64 O: iQA/AwUAPi2tY1VioDO/jwwhEQIyrACg6HYQDh+ynXbfqSp+4hF3kfb6zQIAnRYN O: Ca1gPsBiRizLdYbtci4yVJRziQA/AwUAPi2tY1VioDO/jwwhEQIyrACg6HYQDh+y O: nXbfqSp+4hF3kfb6zQIAnRYNCa1gPsBiRizLdYbtci4yVJRz O: =1cuV O: Message-ID: <002b01c2c1b7$30889ab0$1c01010a@europa> O: From: "PD" <pd@europa.cybernetsoft.com> O: To: "Prasenjeet Dutta" <pd@gc.net> O: Subject: Typical SMTP Exchange O: Date: Wed, 22 Jan 2003 07:10:04 +0530 O: MIME-Version: 1.0 O: Content-Type: text/plain; O: charset="iso-8859-1" O: Content-Transfer-Encoding: 7bit O: O: This is a test. O: O: (message ends next line) O: (this space intentionally left blank) O: . T: 250 Message accepted for delivery O: QUIT
Let us now examine another method: verification via DNS MX records. The rationale here is that the DNS records themselves provide a high degree of accountability, because of which cryptographic verification is not needed. (This approach is vulnerable to DNS spoofing, however -- with modern DNS servers, we consider this risk marginal). While this is not as elegant as the other verification method, it is an optimization for a very common case: the thousands of legitimate email servers which are pointed to as mail exchangers by DNS. It is also potentially much faster for the Destination Server than a cryptographic 'verify' operation. In a way, it formalizes the current practice of performing an RDNS-lookup, but is actually slightly more restrictive by only verifying Origin Servers that hold valid MX records (as opposed to allowing anyone for whom an RDNS lookup succeeds).
T: 220 gc.net ESMTP Service (Domino/5.0.9a) ready at Wed, 22 Jan 2003 09:13:26 -0800 O: Received: from localhost by europa O: (Exim version 3.12 #1); Wed, 22 Jan 2003 07:10:06 O: Origin-Server-Identity: dnsmx; O: cybernetsoft.com O: From: "PD" <pd@europa.cybernetsoft.com> O: To: "Prasenjeet Dutta" <pd@gc.net>
If the Origin-Server-Identity method inserted by the Origin Server is 'dnsmx', then it MUST insert the domain name for which it is an MX (for the current transaction) into the Origin-Server-Identity header (the syntax is specified in Sec. 4). No other action is necessary.
The extra information present in the headers is to be used by the Destination Server for a process we call "Origin Server Verification" (OSV), a process of ensuring the Origin Server is a 'responsible' SMTP server.
The Destination Server does not have to do anything special until the transmission of the SMTP DATA command. It then MUST enter a state where it scans the headers in the incoming datastream (this task can be delegated to a separate module, as is done by several mail servers today with tools like MIME-Defang).
For purposes of OSV, the Destination Server should scan from the top of the datastream (which is usually a Received header) until an Origin-Server-Identity header is found, or another Received header is found, or a From header is found. The idea is that the Destination Server should only perform OSV on the last mail server in the chain. Mail that fails these tests should be marked as Mail of Unverifiable Origin (MUO).
If the identity verification mechanism specified by the Origin-Server-Identity header is 'public-key', then the Origin-Server-Key and Origin-Server-Signature headers should also be present immediately following the Origin-Server-Identity header in sequence, else the mail should be marked as Mail of Unverifiable Origin (MUO).
What the Destination Server does with MUO is beyond the scope of this paper, but indicative choices would be do try a RDNS lookup if enough information is present in the Received header, or run the mail through a Bayesian spam filter, or use a mail processor to ensure MUAs treat it with low priority (such as by directing it to a "Junk Mail" folder), or simply delete such mail.
It is also possible for Destination Servers which scan for OSV on-the-fly to drop connection upon encountering an MUO, but this is not recommended until the infrastructure to deal with OSV-tagged mail is very widely deployed: this will in all likelihood take years to accomplish.
The Destination Server MUST perform Origin Server Verification (OSV) via one of two methods: either cryptographic verification of trust ('public-key'), or a verifiable reference from DNS' MX records ('dnsmx'). Other methods MAY be allowed as extension mechanisms.
The Destination Server can process mails marked for OSV in several ways -- either 'lazy' evaluation using a separate module or plug-in, or on-the-fly. Once a message is successfully passes OSV, local policy at the Destination Server can be used to determine how the message should be further handled. For example, a Destination Server may be configured to mark mail coming from a known mailing list as low-priority.
Mail that fails OSV should be tagged MUO.
This is the preferred method as it allows for server identity verification under a variety of conditions. The algorithm for this would be:
If all the steps listed above complete successfully, the Destination Server MUST understand the message has passed OSV.
Mail Server Verifiers have a crucial role to play in our proposed system, analogous to the role played by Certificate Authorities (CAs) in HTTPS transactions. For this proposal to work, it is necessary that a number of "Root" Mail Server Verifiers are established, who can then sign Intermediate Mail Server Verifiers' keys with exportable signatures. Intermediate Mail Server Verifiers would sign individual OSKs with non-exportable signatures.
Mail Server Verifiers would be expected to enter into a contract with administrators of mail servers with OSKs that would prohibit the administrators from sending bulk mails without stringent guidelines, on penalty of revocation of signature. The details of this contract and other legal details are beyond the scope of this paper, except to note that the contract should follow best-practices as commonly followed by responsible ISPs.
At least some Mail Server Verifiers, we feel, should be not-for-profit organizations who can sign OSKs for organizations at little or no cost, so that the cost of running a OSK-enabled mail server is not unduly high.
Origin-Server-Identity-Header ::=
<[sol]> "Origin-Server-Identity: " <method> ";" <[eol]>
<[tab]> (<server> "," <eol>)*
<tab> <server> <eol>
<server> ::= <[name]> "(" <[ip_address]> ")"
<method> ::= "public-key" | "dnsmx" | <other>
<[tab]> ::= ASCII tab character
<[sol]> ::= start-of-line
<[eol]> ::= network end-of-line (crlf)
<[name]> ::= an RFC 1123-style FQDN
<[ip_address]> ::= a dotted-quad address (or RFC 2732 address for IPv6)
Origin-Server-Key-Header ::=
<[sol]> "Origin-Server-Key: " "<" <[uri]> ">" <[eol]>
<[uri]> ::= an RFC 2396 URI (mailto URIs are allowed)
<[sol]> ::= start-of-line
<[eol]> ::= network end-of-line (crlf)
<other> ::= token for future use
Origin-Server-Signature-Header ::=
<[sol]> "Origin-Server-Signature: " <sigtype> "; encoding=" <encoding> <[eol]>
(<[tab]> <[sigcharseq]> <eol>)+
<sigtype> ::= "rfc2440" | "x509" | <other>
<sigcharseq> ::= encoded signature
<encoding> ::= "base64" | <other>
<[tab]> ::= ASCII tab character
<[sol]> ::= start-of-line
<[eol]> ::= network end-of-line (crlf)
<other> ::= token for future use
The OSV system for mail-server identification, we believe, will reduce spam by making spam-spewers much more accountable for their actions, while not impeding users who wish to run their own mail servers. It complements MUA-spam-filtering techniques by filtering out bogus senders with a very high degree of accuracy. Finally, it does all this in a manner that is transparent to the end-user, an important consideration for the millions of neophytes who depend on others for their email infrastructure.
| Home. |
Created: Jan 23 2003 |