Looking at 2007’s data breaches in perspective

By , December 31, 2007 10:42 am

2007 is being described as a “banner year” or the “worst year ever” in terms of data breaches, depending on which headlines you read. But does 2007 really live up to the hype? If we look only at breaches that occurred during 2007, was 2007 the “worst year ever?” An article by Mark Jewell of the Associated Press highlights some of the difficulties in drawing meaningful conclusions as to whether 2007 was worse than 2006:

Foley’s group lists more than 79 million records reported compromised in the United States through Dec. 18. That’s a nearly fourfold increase from the nearly 20 million records reported in all of 2006.

Another group, Attrition.org, estimates more than 162 million records compromised through Dec. 21—both in the U.S. and overseas, unlike the other group’s U.S.-only list. Attrition reported 49 million last year.

Finding the Common Denominators

The following table depicts the number of U.S. incidents reported and the corresponding number of records reported expose by the three main sites that track such data: Attrition.org, the Privacy Rights Clearinghouse (PRC), and the Identity Theft Resource Center (ITRC).


Source Number of U.S. Incidents, 2006 Number of Records, 20061 Number of U.S. Incidents, 2007 Number of Records, 20071 Number of Records, 2007 – TJX
Attrition.org 326 45,538,298 275 126,231,985 32,231,985
Privacy Rights Clearinghouse 322 100,453,7302 <3003 ? ?
Identity Theft Resource Center 3924 >49,000,0005
4436 127,369,523 33,369,523


1 For both 2006 and 2007, there are many incidents for which we have no numbers, so numbers indicated in the table are only a subset of the actual number of records exposed for the incidents reported for the year. And of course, the number of incidents reported for each year is only a subset of all incidents for each year because there are many incidents we never find out about.

2 PRC’s total of 100,453,730 minimum for the year does not correspond to their actual chronology for the year if you exclude the numbers they said they were excluding due to factors such as no SSN involved, etc. For incidents that they included in their chronology total, their figures would appear to be closer to that reported by Attrition.org for 2006. The 100 million figure is probably a better estimate (although still an underestimate) of the number of records containing personally identifiable information that were exposed last year, but is a bit of apples and oranges.

3 Estimated, based on skimming their chronology for this year.

4 In 2006, ITRC reported and tabulated 312 incidents plus 80+ other incidents that they reported but did not include in their figures.

5 Because of ITRC’s criteria in 2006, they did not include in their totals over 80 incidents, including the VA data breach in May 2006 affecting over 26.5 million or the Chase incident affecting 2.6 million records. Had those two incidents plus other incidents they reported but did not include been included, their total for 2006, which was reported as 19,013,371 (and which Mark Jewell uses as a basis for comparison), would have been over 49,000,000 — somewhat higher than Attrition.org’s figure for the year.

6 ITRC changed its inclusion criteria and its method for locating breaches in 2007. Prior to 2007, they did not include paper breaches, and in 2007, they also began using PogoWasRight.org as a primary resource for locating incidents for their analyses. Hence, their higher figure for number of incidents must be interpreted with caution as they now include many smaller incidents that they might not have found or included in 2006.

When the additional information is considered, the data suggest that all three sources reported about 45-50 million records containing SSN or financial details exposed in 2006. And the two sources for whom we have 2007 data both reported fewer incidents in 2007 and a 30% decrease in number of records exposed in 2007 if the outlier TJX data are excluded because the breach itself occurred prior to 2007. If the TJX data were assigned to 2006 or 2005, the decrease in number of records would appear even more dramatic.

Fewer incidents. Fewer records exposed. That sounds like progress, doesn’t it? And given that there were more states requiring disclosure in 2007 than in 2006 and that despite the new mandates, there were reportedly fewer new incidents, why are people talking about incidents and exposure increasing? Yes, we found out about more records exposed this year, but that does not mean that more records were exposed this year (well, in the U.S. anyway — the U.K. is having its own version of the VA debacles of 2006 but with their HMRC). If next year we discover that a breach in 2005 exposed 150,000,000 records, should we add that to next year’s data, or go back and revisit the 2005 data? If our purpose is to analyze trends, then including the TJX data in 2007 analyses is only going to mislead us. Certainly the numbers should be reflected in any grand total computed over years. My objection is to having an outlier from one year be included in analyses for another year.

Perhaps a more refined analysis could point out in which sectors the number of incidents may have increased in 2007, where it decreased, and where there was no change. Using Attrition.org’s database and searching by Etiolated.org‘s research tool reveals some interesting findings:

Sector or Type of Organization:

  • The number of breaches involving U.S. businesses decreased from 108 in 2006 to 78 in 2007, but the number of records exposed doubled — and that’s with the TJX incident excluded. If the TJX data are included, then the number of records exposed increased by a factor of 13.
  • The number of breaches involving educational institutions remained the same from 2006 to 2007, but the number of records exposed decreased by more than 50%.
  • The number of breaches involving the federal government decreased by 50% from 2006 to 2007, with a significant decrease in number of records exposed. This is primarily due to the VA breach in 2006 inflating the 2006 numbers significantly.
  • The number of breaches involving financial institutions decreased from 43 in 2006 to 28 in 2007, but the number of records exposed more than doubled, primarily due to the TD Ameritrade and Fidelity/Certegy incidents.
  • The number of breaches involving state governments remained about the same in 2007, but the number of records exposed nearly doubled, due primarily to one incident involving Affiliated Computer Services.

Breach Source:

  • The number of “outside” incidents rose slightly in 2007, but the number of records exposed decreased by about 50% when TJX data are excluded.
  • The number of “inside” incidents decreased by over 50% from 2006 to 2007, but the number of records exposed more than doubled, due, in large part, to the Fidelity/Certegy incident.

Breach Type:

  • The number of incidents involving hacks decreased by about 20% in 2007, but the number of records exposed increased. Even with the TJX data excluded, there is still a significant increase in the number of records exposed due to hacking in 2007.
  • The number of incidents involving web exposure increased by about 30% in 2007, but the number of records exposed decreased by over 60%.

Anyone who’s interested can run additional analyses of their own via Etiolated.org, which is an excellent site. For now, I think that claims that problems are increasing are misleading, at best. It would be more helpful, I think, to take a look at the progress that has been made, and then see where progress has not been made, or where things may actually be worse.

But more importantly (to me, anyway): do the numbers from security breach analyses really give any kind of accurate indicator about the extent of privacy breaches? Not one of the three sources included the Astroglide breach. Nor do the sites include incidents where the information exposed is “just” directory information or names and email addresses. Although I understand the reasoning of those who do not consider these incidents as putting people at risk, the Astroglide breach and the recent breach involving names and email addresses of those who visit adult web sites serve as glaring reminders that security breaches that do not include SSN or financial details have the potential for significant personal consequences. As someone who is focused on the privacy aspects, I wish that those organizations and sites that track breaches would use a broader definition of PII.

In any event, I don’t see 2007 as a “banner year” for data breaches, unless the banner reads, “Hooray, we may be making some progress!”

And thus endeth this, my last mini-rant for 2007.

Happy New Year, everyone.

Update: two subsequent posts where I provide additional statistical analyses of 2007 vs. 2006 breach data can be found here and here.

5 Responses to “Looking at 2007’s data breaches in perspective”

  1. ed dickson says:

    Great post – it puts the entire thing in perspective!

  2. Dissent says:

    Thanks, Ed. It may not be as headline-grabbing to say, “Hey, it wasn’t as bad as the year before in some respects,” but it does seem more accurate.

  3. Excellent post. One other point you might want to consider is that, for at least the TJX case, it is not at all clear how many records were actually lost. The 94 million figure came from the filings of the plaintiff in that case, and, as I mentioned here http://ephemerallaw.blogspot.com/2008/01/2007-year-of-controversy.html could be highly overstated.

  4. dissent says:

    I’ve tried to get a reply from TJX on that point. So far, no joy, but I’ll try again this week. I think we can all understand why they may not want to answer that question, but it would be helpful to those who try to analyze data.

  5. Chris says:

    Next headline: European breaches increase 23,000%!!!!!!

    (I kid because I love)

Panorama Theme by Themocracy