Google Groups, Privacy and Spam
Writing to Farber’s Interesting People list, Lauren Weinstein writes:
Their new system is obscuring *all* e-mail addresses in *all*
netnews messages in the archive (including the vast numbers of
messages that do not originate within the Google environment and/or
that predate the existence of Google Groups). This includes not
only the addresses of individual netnews item authors, but also all
e-mail addresses within the body of those messages including contact
addresses, list addresses, administration addresses, etc.…
There is no way (that I can find) to restore any of the e-mail
addresses in the headers or bodies of these messages, including
items ported in from external mailing lists. The “show original”
option simply provides an unparsed textual version — but all e-mail
addresses are still mangled. In some cases it might be possible to
guess the missing portions of the addresses, but in most cases this
would not be possible.
Actually, it is possible to find the original email addresses. You just need to use Google for it.
Take a message such as this one [link to http://groups-beta.google.com/group/comp.mail.sendmail/browse_thread/thread/2b0be92fcd07d403/0ab96e752d68dc93? no longer works]. Note the “a…@bwh.harvard.edu” email address. You now have a valid domain, and can start constructing addresses, and feeding them into Google. At some point, your address construction algorithm will emit “adam”, and you’ll get a link back to the original message.
Now, that’s not easy, and only, ummm, spammers will really go to the effort, until some hacker writes us some exploit code.
Like a lot of security measures, this one falls to many eyeballs looking to get clever around it. (Google could nominally detect and block the attack, but that would require huge distributed state being added to its web servers, which strikes me as unlikely.)
[Update: Lauren’s message points out important issues of Google mangling the data, and copyright issues associated with that. I was responding to a side point, but if you can bypass their message mangling, and get the email addresses, then their security measure is just theatre.]