A small primer on Greylisting email
by
on Monday, October 22, 2007
Mistah Wheelus requested I write an explanation we could give to our customers about greylisting. And here's my attempt at being lucid.
Greylisting is an easy and effective (so far) anti-spam technique (our current tests show an effective spam-stopping rate of over 97%) but before I explain how it works, I must first explain a bit how the email system works.
Once you click “send” your computer will connect with an outgoing email server (this is the “Outgoing SMTP Server” setting in the configuration of your email account); it then identifies itself to the outgoing email server, sends your email address (technically, the “sender address”), then the email address of who you are sending it to (the “recipient address”) and then finally, the actual email (which may have nothing to do with either the sender or recipient email address, a fact that spammers often exploit for fun and profit).
Once the outgoing email server has accepted the email, it is then queued up for final delivery.
Technobabble
Actually, I'm describing the most commonly used configuration, where all outgoing emails for an organization (or an ISP) are funneled through a so-called “relay host” or “smart host” because of security issues or as a means of preventing outgoing spam.
Some ISPs go so far as to block all outgoing email traffic from their subscriber base, only allowing connections to their outgoing email server.
If an outgoing email server isn't required, then your computer may very well connect directly to the server responsible for the recipient's email and deliver the email directly. But then, your computer becomes responsible for redelivery in case the recpient server can't accept the email at that time.
There are more details of this in the next Technobabble section.
The outgoing email server will then look up where to send the email based upon the recipient's domain name, and once this is done, connects to an incoming email server that handles email for the recipient, and using SMTP, deliver the email to the recipient's email box. And if for any reason the email can't be delivered, or there's an error during the delivery, the outgoing email server queues up the email for another attempt at a later time (which can be a few minutes to maybe an hour later). And this is an important detail to remember.
Technobabble
I glossed over quite a few details here. The computer sending
the email to the recipient first does a DNS lookup for a special type of record, the
MX record. This returns a
list of servers than handle email for that domain. For instance, at
this moment in time, the following servers handle incoming
email for gmail.com:
| Server name | Server Priority |
|---|---|
| gmail-smtp-in.l.google.com | 5 |
| alt1.gmail-smtp-in.l.google.com | 10 |
| alt2.gmail-smtp-in.l.google.com | 10 |
| gsmtp163.google.com | 50 |
| gsmtp183.google.com | 50 |
The server(s) with the lowest priority is checked first. If more
than one server has the same priority, then one is picked randomly.
So, in this case, if for some reason
gmail-smtp-in.l.google.com is not responding, then the
sending computer picks either
alt1.gmail-smtp-in.l.google.com or
alt2.gmail-smtp-in.l.google.com.
Oh, and what if there isn't an MX for the domain in question? Then the sending computer looks up the IP address associated with the domain and delivers the email to that machine.
Once the email has been successfully delivered, it's then saved in the recpients incoming email box, which stays there until the recipient retrieves the email (which is beyond the scope of this entry).
Now, how does Greylisting fit into all of this?
Greylisting works on the recipient side of this. Send me an email, and eventually, some server
from your end (“your server”) will contact the server on my end (“my
server”) to deliver the email. My server then has three pieces of
information: the IP address of
your server, your email address (assuming it matches the sender address) and
my email address. And for the sake of an example, let's say it's [
3.4.5.6 , fred@example.net , sean@pickint.net ]. My server will see
if it has seen that particular combination before, and if not, record it,
and send back to your server “try again later.” And until it's been at
least 25 minutes since I first saw that particular combination, my server
will keep sending back “try again later.”
After the initial 25 minutes, any email from 3.4.5.6 with
sender fred@example.net and recipient
sean@pickint.net will be accepted. But other emails from
3.4.5.6 can still experience the delay, if the sender email
addresss, recipient email address, or both, are different. It's the
combination of all three pieces of information that have to match.
Basically, greylisting delays an initial email by some period of time, only “whitelisting” it after a delay period. And while it seems strange, that simple strategy can easily filter out 97% of all spam, since most spammers don't want to bother with redelivery of non-delivered email. They're trying to get their spam out as fast as possible. Attempting to redeliver their spam will only complicate things on their end.
And it's this delay that causes the biggest complaints. But the delay is for an initial email from an unknown source. Once whitelisted, no delay. Second, email is not (and never was) instant messaging, despite it appearing that way. And third … um … do not talk about Fight Club?