Spam: back from CEAS. The
schedule with links to full papers is up, so anyone can go along
and check 'em out, if you're curious.
Overall, it was pretty good -- not as good as last year's, but still
pretty worthwhile. I didn't find any of the talks to be quite up to the
standards of last year's TCP damping or Chung-Kwei papers; but the
'hallway track' was unbeatable ;)
Here's my notes:
AOL's introductory talk had some good figures; a Pew study reported that
41% of people check email first thing in morning, 40% have checked in the
middle of the night, and 26% don't go more than 2-3 days without checking
mail. It also noted that URLs spimmed (spammed via IM) are not the same as
URLs spammed -- but the obfuscation techniques are the same; and they're
using 2 learning databases, per-user and global, and the 'Report as Spam'
button feeds both.
Experiences with
Greylisting: John Levine's talk had some useful data -- there are
still senders that treat a 4xx SMTP response (temp fail) as 5xx (permanent
fail), particularly after end of the DATA phase of the transaction, such
as an 'old version of Lotus Notes'; and there are some legit senders, such
as Kodak's mail-out systems, which regenerate the body in full on each
send, even after a temp fail, so the body will look different. He found
that less than 4% of real mail from real MTAs is delayed, and overall, 17%
of his mail traffic was temp-failed. The 4% of nonspam that was delayed was
delayed with peaks at 400 and 900 seconds between first tempfail and
eventual delivery.
As usual, there were a variety of 'antispam via social networks' talks --
there always are. Richard Clayton had a great point about all that:
paraphrasing, I trust my friends and relatives on some things, and they
are in my social networks -- but I don't trust their judgement of what is
and is not spam. (If you've ever talked to your mother about
how she always considers mails from Amazon to be spam, you'll know what he
means.)
Combating Spam through
Legislation: A Comparative Analysis of US and European Approaches:
the EU 'opt-in' directive is now transposed everywhere in the EU;
EU citizens who are spammed by a citizen from another EU country,
the reports should be sent to the antispam authority in the sender's
country; and there's something called 'ECNSA', an EU contact network of
spam authorities, which sounds interesting (although ungoogleable).
Searching For John Doe: Finding
Spammers and Phishers: MS' antispam attorney, Aaron Kornblum, had a
good talk discussing their recent court cases. Notably, he found one
cases where an Austrian domain owner had set up a redirector site which
sounded like it was expressly set up for spam use -- news to me (and
worrying).
A Game Theoretic Model of Spam
E-Mailing: Ion Androutsopoulos gave a very interesting talk on a game
theoretic approach to anti-spam -- it was a little too complex for the
time allotted, but I'd say the paper is worth a read.
Understanding How Spammers
Steal Your E-Mail Address: An Analysis of the First Six Months of Data
from Project Honey Pot: Matthew Prince of Project Honeypot had some
excellent data in this talk; recommended. He's found that there's an
exponential relationship between google Page Rank and spam received at
scraped addresses, which matches with my theory of how scrapers work; and
that only 3.2% of address-harvesting IPs are in proxy/zombie lists
compared to 14% of spam SMTP delivery IPs. (BTW, my theory is that
address scraping generally uses Google search results as a seed, which
explains the former.)
Computers beat Humans at Single
Character Recognition in Reading based Human Interaction Proofs
(HIPs): this presented some great demonstrations of how a neural
network can be used to solve HIPs (aka CAPTCHAs) automatically. However,
I'm unsure how useful this data is, given that the NN required 90000
training characters to achieve the accuracy levels noted in the paper;
unless the attacker has access to their own copy of the HIP implementation
they can run themselves, they'd have to spend months performing HIPs to
train it, before an attack is viable.
Throttling Outgoing SPAM for
Webmail Services: cites
Goodman in ACM E-Commerce 2004 as saying that ESP webmail services are
a 'substantial source of spam', which was news to me! (less than 1% of
spam corpora, I'd guess). It then discusses requiring the submitter of
email via an ESP webmail system to perform a hashcash-style proof-of-work
before their message is delivered. By using a Bayesian spam filter to
classify submitted messages, the ESP can cause spammers to perform more
work than non-spammers, thereby reducing their throughput. Didn't strike
me as particularly useful -- Yahoo!'s Miles Libbey got right to the heart
of the matter, asking if they'd considered a situation where spammers have
access to more than one computer; they had not. A better paper for this
situation would be Alan
Judge's USENIX LISA 2003 one which discusses more industry-standard
rate-limiting techniques.
SMTP Path Analysis: IBM
Research's anti-spam team discuss something very similar to several
techniques used in SpamAssassin; our versions have been around for a
while, such as the auto-whitelist (which tracks the submitter's IP address
rounded to the nearest /16 boundary), since 2001 or 2002, and the Bayes
tweaks we added from bug 2384, back in 2003.
Naive Bayes Spam Filtering
Using Word-Position-Based Attributes: an interesting tweak to
Bayesian classification using a 'distance from start' metric for the
tokens in a message. Worth trying out for Bayesian-style filters,
I think.
Good Word Attacks on
Statistical Spam Filters: not so exciting. A bit of a rehash of
several other papers -- jgc's talk at the MIT conference on attacking a
Bayesian-style spam filter, the previous year's CEAS paper on using a
selection of good words from the SpamBayes guys, and it entirely missed
something we found in our
own tech report -- that effective attacks will result in poisoned
training data, with a significant bias towards false positives. In my
opinion, the latter is a big issue that needs more investigation.
Stopping Outgoing Spam by
Examining Incoming Server Logs: Richard Clayton's talk. Well worth a
read. It's an interesting technique for ISPs -- detecting outgoing spam
by monitoring hits to your MX from your own dialup pools which uses known
ratware patterns.
more...Autoresponder MarketingOriginally Posted on 8/17/2005 10:38:15 PMContent source: http://taint.org/2005/07/25/080041a.html