Second look: What kind of year was 2007 in terms of data breaches?

By , January 3, 2008 1:02 pm

In response to media reports that the theft of personal data more than tripled in 2007 and that there were a record number of data breaches in 2007, I tried to show that depending on how you crunch the numbers, theft did not triple and there may have been fewer new U.S. incidents in 2007 than in 2006.

Since publication of those blog entries, Lawrence Walsh of Baseline and Thomas Claburn of InformationWeek have both written pieces about 2007 in review. Walsh’s piece accepts the premise that the numbers are going up, but then discusses the number in terms of actual cost and other factors, such as whether some breaches might be justified. Claburn’s piece follows up on my analysis. He writes:

Rex Davis, director of operations for the Identity Theft Resource Center, concedes Dissent makes some valid points, like the fact that the organization began counting paper-based data breaches in 2007.

But Davis also points out that tabulating the number of records exposed is difficult because in 56% of the 2007 breaches reported there was no accurate count of the number of records exposed. “How can you say the number of records is going up or down when it’s not reported?” he said.

I agree completely with his last point, and if everyone were to say, “We don’t know how many data breaches there were in 2007 and we don’t know how many records were newly exposed in 2007 because Congress has not passed a national mandatory disclosure law,” I’d nod my head vigorously in agreement. But since the media were comparing 2007 to 2006 in terms of totals and basing it, in part, or ITRC’s and Attrition.org’s data, I thought it was important to point out that if you simply take the numbers that we do have for incidents that occurred during 2007 and that were reported during 2007, the number of records newly exposed during 2007 appears to have decreased significantly from 2006. And that is a significantly different statement than what the media have trumpeted.

ITRC sticks to its statement about the number of incidents rising in 2007:

“If you’re talking about the number of events, it’s the worst year we’ve been able to record, even if you add the 80 we left out in 2006,” said Davis.

I appreciate that 2007 may be the worst year they’ve been able to record, but that does not mean that it was the worst year for the number of new incidents, because there were a number of factors that could inflate the 2007 statistic without it indicating an actual increase in incidents/year. Significantly, two other sites reported fewer U.S. incidents during 2007 than in 2006, a statistic that is at odds with the increased number reported by ITRC. Perhaps it would be more conservative to conclude that we simply don’t know whether the total number of incidents rose, fell, or remained the same (because of the lack of a national disclosure law), but with media sources claiming that it was “record year” in terms of number of incidents, I thought it important to point out where the data do not support that assertion.

When we break the available numbers down by sector, we see differences from 2006 to 2007 that are not “across the board” differences. And it is those differences that I find most interesting and helpful. Each site (Attrition.org, Privacy Rights Clearinghouse, and the Identity Theft Resource Center) somewhat serves as its own best comparison for year-to-year comparisons (all else being equal). When we break incidents down by sector, do all three sites show the same pattern of decreases and increases? I provided some preliminary analyses as examples of what can be run on Etiolated.org’s site using Attrition.org’s DLDOS database. Ideally, someone could take that database, incorporate incidents from other sites that have been omitted (by design or by accident), and then run the year-to-year analyses by sector, type of breach, etc.

To be clear: I’m not “insisting” that I’m right in my analysis of 2007 vs. 2006. What I am “insisting” is that the data do not support the sweeping statements made by the media because of all the confounds and that it is possible that 2007 was actually a better year in some respects. Conducting more refined analyses by sector and type of breach will hopefully shed some light on what was worse, what wasn’t, and what improved in 2007. But even then, we need to recognize that we only have a subset of all breaches for the year and for many of them, no numbers at all.

The bottom line is that if we want to make any sense out of data, we need more transparency and mandatory disclosure so that we can get ALL of the numbers on ALL of the incidents.

Congress, are you listening yet?

Comments are closed

Panorama Theme by Themocracy