Daniel Barth-Jones has a critique of re-identification studies that informs the conversation about risks:
In a recent Health Affairs blog article, I provide a critical re-examination of the famous re-identification of Massachusetts Governor William Weld’s health information. This famous re-identification attack was popularized by recently appointed FTC Senior Privacy Adviser, Paul Ohm, in his 2010 paper “Broken Promises of Privacy”. Ohm’s paper provides a gripping account of Latanya Sweeney’s famous re-identification of Weld’s health insurance data using a Cambridge, MA voter list. The Weld attack has been frequently cited echoing Ohm’s claim that computer scientists can purportedly identify individuals within de-identified data with “astonishing ease.”
However, the voter list supposedly used to “re-identify” Weld contained only 54,000 residents and Cambridge demographics at the time of the re-identification attempt show that the population was nearly 100,000 persons. So the linkage between the data sources could not have provided definitive evidence of re-identification. The findings from this critical re-examination of the famous Weld re-identification attack indicate that he was quite likely re-identifiable only by virtue of his having been a public figure experiencing a well-publicized hospitalization, rather than there being any actual certainty to his purported re-identification via the Cambridge voter data. His “shooting-fish-in-a-barrel” re-identification had several important advantages which would not have existed for any random re-identification target. It is clear from the statistics for this famous re-identification attack that the purported method of voter list linkage could not have definitively re-identified Weld and, while the odds were somewhat better than a coin-flip, they fell quite short of the certainty that is implied by the term “re-identification”.
The full detail of this methodological flaw underlying the famous Weld/Cambridge re-identification attacks is available in my recently released paper. This fatal flaw, the inability to confirm that Weld was indeed the only man with in his ZIP Code with his birthdate, exposes the critical logic underlying all re-identification attacks.
Statistics for the Utterly Confused by inju, flickr. Used under Creative Commons License.