Shostack + Friends Blog Archive

Data on Data Breaches

At the FIRST conference in Seville, Spain, I delivered a presentation about “Data on Data Breaches” that Adam and I put together. The slides, with the notes I made to act as “cue cards” for me, are available as a large PDF file on a slow web server.
The main points I tried to make are:
That with the availability of breach reports direct from states with central reporting, such as New York, it is possible to measure part of our ignorance when we rely solely on published breach reports — even the best available sources (such as Attrition’s DLDOS) undercount breaches dramatically, and are biased toward larger incidents.
That we are still at the leading edge of an explosion of information, and that we should not draw hasty conclusions until more facts are in.
That, as Emil Faber might put it, “Knowledge is Good” and is not that painful to provide.
And finally, primary materials such as breach reports are useful artifacts not only because they tell us dry facts in a standardized format (but that IS nice), but also because the notices themselves are interesting evidence of how firms talk to their customers about a difficult topic.
I’ll be writing more on this subject now that I have received the fourth batch of breach reports from my pals in New York, and my other pals in New Hampshire have made such materials available on-line.

Originally published by cwalsh on 28 Jun 2007
Last modified on 28 Jun 2007
Categories: breach analysis presentations Uncategorized

7 comments on "Data on Data Breaches"

Andy Steingruebl says:

29 Jun 2007 at 1:04 pm

What I find interesting about the analysis of the reported breaches is that we don’t have a way to tie individual breaches back to cases of identity theft so that we have some idea of the actual impact of the breaches.
You talk about false negatives (undetected data loss.) It would be illuminating to know how often identity theft happens are a result of these data losses versus other means that aren’t necessarily reported. If we had a way to compare these we’d have a much better idea of root-cause.
For example, in most of the data losses we’ve seen where large amounts of data went missing on laptops, we don’t have any idea whether the laptop was quickly formatted and pawned, or sold to a data broker who used it to commit or help commit identity theft.
Any thoughts on how to tackle this piece of the puzzle?
Chris says:

29 Jun 2007 at 7:48 pm

One way to estimate the extent to which having your PII exposed in a breach increases the probability of your becoming an identity theft victim is to watch for the exposed data elements using a fraud detection network. This is something that the folks at ID Analytics offer as a product, actually. I don’t know how effective their fraud detection stuff is, or whether its coverage is biased in any way, but they have a good idea in principle.
Other than using banks as a focal point and having them report on fraud using these stolen elements, I cannot think of another way. Too bad there is so little info available about how the ID Analytics system works, because it is intriguing.
I don’t see any reason, in principle, that this kind of thing couldn’t be done by others (like the credit bureaus) but there may be legal obstacles, and unless they are compelled to do it or can make money at there’s no reason for them to try.
I suppose one could try to determine whether the stolen elements were in the inventory of any black-market sellers, but I do not see how one can gain access to their inventory information. It’s clear that the illicit trade in this stuff is non-trivial, but I honestly do not know that we have anything approaching a comprehensive picture of the landscape.
Andy Steingruebl says:

29 Jun 2007 at 11:06 pm

What I’m reminded of in this debate is the recent reports from the Mitre folks correlating the CVE data to the CWE data. This allows us to not just judge the number of vulnerabilities per product, etc. but it allows us to understand root causes and know where to spend energy on fixes. Or, it tells us what the easiest to exploit vulnerabilities are, and hence the ones we’re smartest to fix.
Since we don’t know how many of the data breaches we’re seeing result in identity theft vs. other avenues, we don’t actually have any ability to prioritize security. Maybe insiders are actually stealing tons of data, we don’t know it, and we’re instead going to spend a lot of data on laptop security instead of better audit logs… Who knows. We’re operating in a vacuum at this point in time.
Chris says:

30 Jun 2007 at 10:45 am

Your second paragraph contains two points.
1. We don’t know how much reported data breaches contribute to ID theft
2. We don’t know how many data breaches there are in the first place.
I addressed 1) in my response to your previous comment.
2) in other words is “there is an unknown number of undetected breaches”
I fully acknowledge 2), and called it out in the presentation. The data we get from the states gives us more than we typically have been working with, but clearly if a SSN is stolen in the forest and nobody hears it leave the database, it doesn’t get reported to anyone.
You are putting your finger on an important point. If what we care about is reducing ID theft, then maybe all this effort about analyzing breach reports is a sideshow, since for all we know 80% of the revealed PII never gets detected as having been revealed. Or maybe it’s 40%. Or 10%. I am interested in this question because it is a cool question. The policy ramifications are a close second to me. Others may have a different view. One approach [link http://chrishoofnagle.com/blog/?p=696] to dealing with measuring ID theft (not breaches — ID theft) has been put forward by Chris Hoofnagle, and it involves making banks mandatory reporters. It’s an intriguing concept that (if it doesn’t create other issues) neatly sidesteps the “dark matter” problem.
Dissent says:

30 Jun 2007 at 5:17 pm

I’m still in the process of compiling health-related or medical privacy breach reports from some sources. Suppose it turns out that all cases of ID theft in the sample are associated with insider/employee theft of PII. Then what? Some might argue that those data might suggest there is no need to notify patients in the event of a hack or lost hard drive, but I disagree that ID theft risk is the sole or most important criteria in determining notification.
Did you read that VA OIG report that came out yesterday? The OIG is suggesting that the govt take another look at whether to notify individuals in the event of another incident involving SSN:
“This data loss incident raises concerns over the lack of Government-wide guidance and criteria on what constitutes high risk data for identity theft and credit protection services. Without well thought-out guidance, Federal agencies are likely to make inconsistent decisions about what protections to offer affected individuals. The question arises whether it is a prudent use of Government resources to offer a year of free credit monitoring to nearly 180,000 individuals at risk solely because their SSN was lost in this breach. For example, some law enforcement agencies have taken the position that release of a SSN alone does not put an individual at risk for identity theft. Because data loss is a systemic problem throughout the public and private sector, developing criteria and guidance for assessing risk associated with a breach of sensitive information should not be relegated to any one Department. An example of why Government-wide criteria is needed is evidenced in the Birmingham data loss case, where some of the missing data is from another Federal agency.” (p. 14)
If all they’re concerned about is ID theft, then they will cut notifications way down. But in my opinion, they should be notifying individuals because dammit, we have a right to know if someone with custody of our details lost them or compromised them.
Chris says:

30 Jun 2007 at 9:47 pm

I did not read the OIG’s report.
I did notice that the VA had a contract with ID Analytics to assess whether the PII in the large VA breach was used to further ID theft. $25,000 to monitor over 25 million IDs is pretty cheap. Way cheaper than “ID theft insurance”.
I suspect that there will be a movement to this kind of thing. There’s a good argument that absent the sort of data I have been arguing we should gather, such a move is premature.
It is even more premature if you care about things other than ID theft, such as privacy.
Andy Steingruebl says:

30 Jun 2007 at 10:31 pm

I just reponsed to Adam’s commentary on the other post and on my own blog…
http://securityretentive.blogspot.com/2007/06/data-breaches-and-privacy-violations.html

Comments are closed.