Shostack + Friends Blog Archive


Researchers Two-Faced over Facebook Data Release

[Update: Michael Zimmer points out that it wasn’t Facebook, but outside researchers who released the data.]

I wanted to comment quickly on an interesting post by Michael Zimmer, “ On the “Anonymity” of the Facebook Dataset.” He discusses how

A group of researchers have released a dataset of Facebook profile information from a group of college students for research purposes, which I know a lot of people will find quite valuable.


Of course, this sounds like an AOL-search-data-release-style privacy disaster waiting to happen. Recognizing this, the researchers detail some of the steps they’ve taken to try to protect the privacy of the subjects, including:

  • All identifying information was deleted or encoded immediately after the data were downloaded.
  • The roster of student names and identification numbers is maintained on a secure local server accessible only by the authors of this study. This roster will be destroyed immediately after the last wave of data is processed.

In the comments, Jason Kaufman implies that the data really isn’t that private, asking what could go wrong, and why would someone post it to Facebook expecting it to remain private.

I have just one question on all of this. If the data isn’t private, why did they attempt to anonymize it?

I believe they attempted to anonymize it because it’s fairly obvious that the data is private, and releasing it with names obviously attached would be pretty shocking. As Michael Zimmer says, “we really need to keep working on a new set of Internet research ethics and methodologies.”

Also, don’t miss Michael Zimmer’s followup post, “More on the anonymity of the Facebook dataset: It’s Harvard College.”

2 comments on "Researchers Two-Faced over Facebook Data Release"

  • Chris says:

    I applaud the way these researchers made their data available.
    That said, I wish said data had never been collected.
    I downloaded the codebook, but not the data. From the marginals it seems re-identification of at least some subjects would be trivial. For example, 1 subject is from Montana. How many minutes with LexisNexis would find a newspaper report applauding a young Montanan for getting into Elite Northeastern Univ?

  • You’re right; they are going to great (but insufficient) lengths to anonymize data that they later (incorrectly) claim isn’t private in the first place. Faulty logic.
    However, to be clear, Facebook isn’t make any such claims. It is a group of researchers from Harvard and UCLA.

Comments are closed.