While the EU is in the process of revising their consumer privacy policies, especially with respect to PII, the US is beginning to rethink its definition of personal data. Those following our blog posts know that policy makers at the FTC earlier in the year released Protecting Consumer Privacy in an Era of Rapid Change. If you hadn’t read this for your homework assignment, don’t worry, we have the key take-aways and what it might mean for your company going forward.
I’ve been writing about personally identifiable information or PII in recent posts. This is simply the legalistic way of talking about IDs–social security numbers, phone numbers, addresses– that can lead back to a person. Companies are required to secure PII against unauthorized access; non-PII has fewer formal controls.
It’s a small acronym with big implications: the PII definition is embedded in many US privacy laws, including Gramm-Leach-Bliley Act (GLBA), Fair Credit Reporting Act (FCRA), and HIPAA, to name just a few.
Along with the EU, the FTC also now recognizes there’s been a blurring between the PII and non-PII parts of consumer records. Regulatory agencies have pointed to the same culprit: consumers are revealing information about themselves in social networks that can help “re-identify” data once considered to be anonymous.
Here’s what the FTC had to say:
“There is significant evidence demonstrating that technological advances and the ability to combine disparate pieces of data can lead to identification of a consumer, computer, or device even if the individual pieces of data do not constitute PII. Moreover, not only is it possible to re-identify non-PII data through various means, businesses have strong incentives to actually do so.”
What specific evidence was the FTC referring to?
Some may remember that in 2006, Netflix, the movie rental service, announced a public contest to improve on its existing algorithms for suggesting new films to subscribers. To give contestants something to work with, Netflix released an enormous data set of de-identified movie ratings from their database–essentially long rows of numbers indicating a Netflix subscriber’s 1- 5 evaluation of titles in the Netflix inventory.
Two University of Texas researchers analyzed the public Netflix data, not to enter the contest but to see if they could re-identify Netflix users. Their strategy was to compare the rows of data from Netflix against ratings submitted by subscribers to IMDb, the popular movie information site. The researchers succeeded: identifying the full preferences of two users with very high confidence.
In other words, by scanning the social networking component of a site where community members reveal only a small set of their movie ratings–say, for 6 movies–the researchers were able (using a straight-forward algorithm) to identify specific users in the Netflix data set, and their complete movie likes and dislikes.
Stunning—in effect, the release of the Netflix data set was a security breach.
In 2009, Netflix was in the early stages of launching a second contest with a new set of anonymized ratings containing more subscriber attributes. The FTC convinced Netflix not to publish the data.
This lesson is now reflected in the new FTC framework released in the Protecting Consumer Privacy report I mentioned earlier. In a significant shift, the report recognizes the limited scope of PII in protecting consumer privacy, proposing instead a framework to secure “consumer data that can be reasonably linked to a specific consumer, computer, or other device.” With the new “reasonably linked” wording, the rules place some anonymous data under the same security protections given to social security numbers.
I should add that the report is essentially a list of best practices for companies to follow –though they may be enforceable under existing laws.
As the agency in charge of carrying out GLBA and FRCA, the report has considerable influence over the financial industry. And some observers see that this new focus by the FTC on data “reasonably linked” may also work its way into Congressional updates to HIPAA and in other new laws (see McCain-Kerry Privacy Bill of Rights).
The take-away is that companies, especially in the financial sector, should treat de-identified data or tables that are routinely embedded in internal spreadsheets, presentations, and free form text documents or perhaps released to third-parties as potential security violations. A first step, of course, is to find this stealthy data and evaluate proper security based on a worst case.
Do you know where all your consumer data is, who can and who does look at?
Image credit: Bjoertvedt