Can Companies Learn to Forget About Their Customers?

In my last few posts, I’ve been focusing on how the rise of social media has forced regulators both here and in the EU to revise their definitions of personal data. With the new emphasis on data that can be  “reasonably linked” to an individual, companies may soon have to extend their security controls over a broader range of consumer information. Interestingly, just as the amount and breadth of personal data is increasing, another almost opposite regulatory requirement is looming on the horizon.

In the FTC privacy guidelines report that I’ve been referring to recently, the agency’s Commissioners refer to a “right to be forgotten”. This is not a legal right in the US (yet). But the concept that data should have a natural shelf life and not be retained longer than necessary is reflected in the language of the report’s framework, which calls for “reasonable collection limits” and “sound retention policies”.

Translation: Companies should delete data they no longer need and also allow consumers to access the data and under “appropriate circumstances” purge or suppress it.

Keeping in mind that these are guidelines and best practices, there’s a great hypothetical case study in the FTC report for the mobile space. The Commissioners point out that GPS generated location data, which is often monitored and saved by smartphone apps, should be treated as identifying information. The reason? Geo coordinates can be used to re-identify customers when connected with other “disparate bits of information”.

In this particular example, the FTC suggests that mobile software companies should limit their retention of business data—say check-ins to a restaurant—and also their sharing of it with third parties.

As I’ve been pointing out, public data on the web — especially on social sites—actually expands the amount of corporate consumer data that would fall under the “reasonably linked” definition. And with this new FTC approach for retention, it means that more data is now a candidate for deletion as well.

Back at the EU, the right to be forgotten is an important part of the planned update to their Data Protection Directive. Unlike the US, it will have the weight of law as EU member countries implement the new rules over the next few years. It will give citizens the right to delete data on request.

In the US, the FTC report may provide clues as to what may be coming out of Congress. The McCain-Kerry Commercial Privacy Bill of Rights, which is currently stalled, does have provisions for data retention limits of personally identifiable information (PII) and other information that may be reasonably used to identify an individual. It also gives consumers some control over their data: they can request that PII and other information be made unidentifiable or not usable. This is less strong than the EU’s right to be forgotten, but it would still require US companies to at least find personal data and then corral it.

The writing is on the wall for US companies: those that implement the FTC best practices for data deletion will be in a much better position when either McCain-Kerry’s Bill of Rights or another law is passed that makes consumer data retention  limits and deletion not just a good idea for companies, but a legal obligation.

 

Email vs. Employee: Can We Win the Inbox Race? [INFOGRAPHIC]

Many of us think we are masters of our email universe. By sending messages from a desktop app or mobile device, we’re able to direct and coordinate coworkers anywhere in the organization to solve problems and ultimately get the work done.  In a newly released research report, Varonis notes that an ever increasing volume of emails is forcing knowledge workers to allocate significant time and effort to managing their inbox.

In our survey, we’ve learned that nearly 25% of our respondents receive between 100 – 500 emails per day. And nearly 85% are spending up to 30 minutes or more every day organizing their messages — over one and one half weeks of work every year. With 22% reporting that there are between 1000 – 5000 emails always contained within their inbox at any given time, filing and organizing message traffic is a hidden task on many workers’ to-do list.

More than just a productivity drain, the survey reveals that email mishaps are causing real harm: over 62% of our survey population report email mishaps that in some cases led to job loss and even compliance violations.

Is there a way out of the email race? We conclude that email filtering and routing software along with other techniques to monitor email loads may become more common place in enterprise environments.

In the meantime, one immediate solution, as some email observers have noted, is to be more careful in crafting electronic correspondence. Besides double checking the email meta-data—to and cc lists, and file attachments—smart users make sure that their email content is short, meaningful, and solves a problem.

Download the Report

Enjoy, share, embed our infographic and download the full report to learn which data protection activities truly matter.

Digital Work Habits

Embed this infographic on your own site

Copy and paste the code below into your blog post or web page:

<a href="http://blog.varonis.com/email-vs-employee-can-we-win-the-inbox-race-infographic/"><img title="The Quest for Inbox Zero - Infographic" src="http://www.varonis.com/assets/infographics/digital-work-habits.jpg" alt="The Quest for Inbox Zero" width="600" /></a>
<p><small>Like this infographic? Get more <a href="http://blog.varonis.com">digital collaboration</a> tips from <a href="http://www.varonis.com/">Varonis</a>.</small></p>

At-Risk Exchange Data

MailboxesOne of the more interesting benefits of  last year’s launch of DatAdvantage for Exchange was the opportunities it presented to talk with different sets of people in our customers’ organizations. Where traditionally we’d worked mostly with security, storage, Windows or Active Directory teams, DatAdvantage for Exchange spurred meetings with messaging, e-Discovery and legal folks as well.

E-mail is a business-critical system, period. From an IT perspective, it may be the most critical system—most companies would rather lose their phones for a day than their e-mail. What that has meant for the Messaging folks in charge of Exchange is that simply keeping the lights on—making sure that emails are being delivered promptly and that the repository of stored data is available—has been far and away more important than access control. However, the consequence of focusing on availability rather than confidentiality or integrity has meant that a lot of the controls and auditing that should be in place are sorely lacking.

Data Governance and Exchange

Exchange is an interesting repository from a data governance perspective. The last time I wrote about using Varonis, I talked about how we can combine data classification with permissions exposure to identify the data that’s most at-risk on a file system or SharePoint site. Unlike a file share, the hierarchy is flat—everyone’s got their own mailbox, and it’s very easy to share out access rights to it. You can, for instance, give someone access to your inbox or calendar. With IT’s help, you can give them the ability to send email on your behalf, or even “as” you. Exchange is exactly like file shares in that mailbox access is reviewed  periodically, mailboxes stay shared and users have send-as or send-on-behalf-of privileges for a long, long time.

What’s at Risk?

One of the first things we do when we spin up DatAdvantage for Exchange for a customer is to run a report that shows them everywhere someone in the organization has access to a mailbox that isn’t their own.

Everyone has access to their own mailbox by default. It takes some sort of permissions change, though, either on the client (Outlook) side, or by the admin on the Exchange server, to grant someone access to another mailbox. One of things we’re seeing when we do this, by the way, it that the mailboxes that are without question most likely to have been shared are those that are probably considered the most valuable—those of the CEO and other high-level management.  While native tools might let you manually (and somewhat painfully) check permissions on a mailbox-by-mailbox basis, Varonis gives you the ability to see where anyone has access to an object that’s not part of their own mailbox.

We take that risk assessment a step further, too, with another report that will show you where people are actually accessing data in mailboxes that don’t belong to them. For good or ill, these are probably the permissions you want to take a look at first from a governance perspective.

Photo credit: dcJohn

Banks See Social Media as Big Data Opportunity

Last month I attended a digital advertising conference here in NYC which was swarming with social media benchmarking vendors. If you wanted to learn more about software that measures how your company or brand is faring on Twitter, Facebook, or Pinterest, then this was the place to be. These buzz-monitoring apps make perfect sense for consumer-focused product companies (sneakers, clothing, soft drinks), but I didn’t necessarily connect the dots between social media content and big data for financial service firms.

That is until I saw this article in American Banker on big data in the banking world. Specifically, BNY Mellon Bank ($1.4 trillion assets) is launching its own big data project, which will involve collecting and aggregating transactional information from customers across many different systems—their web site, ATM network, customer service, trading desks, and any other relevant interaction points.

The goal is to pull these separate data streams into a centralized data store, and then mine it to learn customer behaviors and preferences. The results will be fed back to their marketing department to help pinpoint customers who would most likely be interested in new bank offerings. BNY Mellon will also use this data to gain more complete awareness of customer needs in their future interactions with the bank.

It doesn’t stop there. BNY Mellon has extended the scope of its big data project beyond its own internal IT operations by harvesting content from the social world—blogs, Twitter, LinkedIn, and other online forums.

How much data can be found in Tweets and posts that would be useful for banks and financial companies?

This is hard to gauge. But according to an IDC report referenced in the American Bank piece, 1.8 trillion gigabytes of data was generated in 2012 with the majority of that considered unstructured social data.

These numbers for social data sound about right. Earlier in the year, Twitter reported its users were sending 340 million tweets per day. Doing a quick back of the envelope calculation—340 million x 140 x 365—I come up with at least 10,000 gigabytes of data just from Twitter alone. Then if you start adding Linkedin with its 175 million users and Facebook’s close to 1 billion users and the millions of active blogs out there, it’s easy to see how unstructured text from social begins to reach the volumes in the IDC range.

For large financial firms with millions of their own customers, filtering out, processing, and storing what’s relevant clearly falls in the big data solution space. The larger point is that banks are looking at this public data as an auxiliary treasure trove from which they can supplement their existing records with more granular details about their own customers, and even perhaps find potential new markets.  Like everyone else they are also concerned about their brand and the buzz around it.

Lessons learned?  Here’s one: even those companies most closely associated with large traditional fixed-field databases —in this case, a financial institution, but also consider, say, insurance, power utilities, and telecom carriers—will by necessity also have to deal with petabytes of content in order to complete the big data puzzle.

Shift in FTC Consumer Privacy Policy May Signal New Laws in US

While the EU is in the process of revising their consumer privacy policies, especially with respect to PII, the US is beginning to rethink its definition of personal data. Those following our blog posts know that policy makers at the FTC earlier in the year released Protecting Consumer Privacy in an Era of Rapid Change. If you hadn’t read this for your homework assignment, don’t worry, we have the key take-aways and what it might mean for your company going forward.

I’ve been writing about personally identifiable information or PII in recent posts. This is simply the legalistic way of talking about IDs–social security numbers, phone numbers, addresses– that can lead back to a person. Companies are required to secure PII against unauthorized access; non-PII has fewer formal controls.

It’s a small acronym with big implications: the PII definition is embedded in many US privacy laws, including Gramm-Leach-Bliley Act (GLBA), Fair Credit Reporting Act (FCRA), and HIPAA, to name just a few.

Along with the EU, the FTC also now recognizes there’s been a blurring between the PII and non-PII parts of consumer records. Regulatory agencies have pointed to the same culprit: consumers are revealing information about themselves in social networks that can help “re-identify” data once considered to be anonymous.

Here’s what the FTC had to say:

There is significant evidence demonstrating that technological advances and the ability to combine disparate pieces of data can lead to identification of a consumer, computer, or device even if the individual pieces of data do not constitute PII. Moreover, not only is it possible to re-identify non-PII data through various means, businesses have strong incentives to actually do so.”

What specific evidence was the FTC referring to?

Some may remember that in 2006, Netflix, the movie rental service, announced a public contest to improve on its existing algorithms for suggesting new films to subscribers. To give contestants something to work with, Netflix released an enormous data set of de-identified movie ratings from their database–essentially long rows of numbers indicating a Netflix subscriber’s 1- 5 evaluation of titles  in the Netflix inventory.

Two University of Texas researchers analyzed the public Netflix data, not to enter the contest but to see if they could re-identify Netflix users.  Their strategy was to compare the rows of data from Netflix against ratings submitted by subscribers to IMDb, the popular movie information site. The researchers succeeded: identifying the full preferences of two users with very high confidence.

In other words, by scanning the social networking component of a site where community members reveal only a small set of their movie ratings–say, for 6 movies–the researchers were able (using a straight-forward algorithm) to identify specific users in the Netflix data set, and their complete movie likes and dislikes.

Stunning—in effect, the release of the Netflix data set was a security breach.

In 2009, Netflix was in the early stages of launching a second contest with a new set of anonymized ratings containing more subscriber attributes. The FTC convinced Netflix not to publish the data.

This lesson is now reflected in the new FTC framework released in the Protecting Consumer Privacy report I mentioned earlier. In a significant shift, the report recognizes the limited scope of PII in protecting consumer privacy, proposing instead a framework to secure “consumer data that can be reasonably linked to a specific consumer, computer, or other device.” With the new “reasonably linked” wording, the rules place some anonymous data under the same security protections given to social security numbers.

I should add that the report is essentially a list of best practices for companies to follow –though they may be enforceable under existing laws.

As the agency in charge of carrying out GLBA and FRCA, the report has considerable influence over the financial industry. And some observers see that this new focus by the FTC on data “reasonably linked” may also work its way into Congressional updates to HIPAA and in other new laws (see McCain-Kerry Privacy Bill of Rights).

The take-away is that companies, especially in the financial sector, should treat de-identified data or tables that are routinely embedded in internal spreadsheets, presentations, and free form text documents or perhaps released to third-parties as potential security violations. A first step, of course, is to find this stealthy data and evaluate proper security based on a worst case.

Do you know where all your consumer data is, who can and who does look at?

Image credit: Bjoertvedt

FTC Decides Facial Images Should Be Secured

I was ready to start writing about the FTC’s recent recommendations on personally identifiable data (or PIIs), when the agency suddenly lobbed a new guideline onto the scene. Released last Monday, Facing Facts: Best Practices for Common Uses of Facial Recognition Technologies is focused on the risks involved in not securing photographic images and data. It’s another indication that US regulatory rules will be leaning toward a broader definition of what it means to relate “anonymous” data back to an individual.

“Facing Facts” doesn’t read like a standard-issue government agency report. It opens by referencing the Speilberg movie, Minority Report, and its vision of a future where ads are served up based on scans of biometric data.

That world is not quite here yet, but the FTC’s larger point with these new guidelines–read as best practices–is that facial recognition technology has become quite sophisticated and potentially disruptive.

Not only is it possible to use non-proprietary software and hardware to pull key information out of digital images, it’s already being done on a commercial basis. Retailers now install digital signage in mall kiosks to serve up ads based on the gender, age, and other demographic information of the consumers viewing the informational screens.

Even more impressive is that existing facial recognition technology has reached a high-level of accuracy in comparing photos and finding matches. The National Institute of Standards and Technology (NIST) reports that the false reject rate–percentage of comparisons incorrectly rejected–has been cut in half every two years. At the NIST’s Face Recognition Grand Challenge in 2010, the winning company achieved a false reject rate of 2.1% while still delivering a false acceptance rate of .1%.

The FTC’s overall message for companies capturing facial data is three-fold:

  1. Build privacy into products “by design”
  2. Give consumers choice when capturing this information
  3. Be transparent about what’s being done with the images

There are obvious and direct implications for companies in consumer retail and social networking (Facebook typically sees over 1 billion photos monthly). But the scope of this FTC announcement is larger than one might at first guess.

Once upon a time, a file directory of employee digital photos—say for badges–would not have been considered worthy of much security—it’s technically not even considered PII.  This latest FTC announcement, however, indicates a far different view of what it means to trace ostensibly anonymous data back to an individual.

One of the keys points in the FTC’s argument –one I noted in a previous post—is that easy-to-access publicly available data on social networks changes the rules. In a Carnegie Mellon study cited by the FTC, researchers were able to match a set of unidentified photos against existing tagged Facebook photos.

Here’s what the FTC had to say on how companies should now treat digital facial data:

“First, companies should maintain reasonable data security protections for consumer’s images and the biometric information collected from those images to enable facial recognition (for example, unique measurements such as size of features or distance between the eyes or the ears). As the increasing public availability of identified images online … companies that store such images should consider putting protections in place.”

Do your company’s digital photo images—headshots, events, publicity, etc.—have the appropriate access rights? Do you even know where they are?

While you’re pondering that, I’ll be giving Minority Report a second viewing.

 

On Employee Data Theft

Employee Data TheftLast week Zynga, the social gaming company famous for Farmville and Cityville, filed a lawsuit against former employee Alan Patmore for making off with 763 documents—including business plans and other intellectual property—and  bringing them to competitor Kixeye.  Patmore doesn’t deny the claim.

It hasn’t been confirmed exactly how Zynga discovered that Patmore nabbed the documents, but I wonder if software, not a human, sounded the alarm.

Sadly, this kind of unethical behavior happens more frequently than you’d think.  According to Cyber-Ark’s 2012 global Trust, Security and Passwords Survey, slightly less than half of respondents admitted that if they were fired today, they would pocket proprietary data – even knowing it wasn’t allowed.

Other findings from the survey:

  • 45% said they have access to information that is not relevant to their role
  • 42% indicated they have used admin credentials to access information that was marked confidential
  • 55% believe competitors have obtained their company’s  intellectual property

The Zynga case underscores organizations’ need to ensure that only the right users have access to the right data at all times, access is monitored, and abuse is flagged.

For every person who is caught stealing intellectual property from an employer, how many fly under the radar?  Insider threats are something organizations need to take seriously.

Want to find out if suspicious behavior is occurring in your environment?  We’ll show you.

 

What Happens When There’s No Cloud?

Old Man Yells at CloudEarlier today Amazon’s Elastic Compute Cloud experienced a significant outage, bringing down some major websites, notably GitHub, Reddit, and Imgur, among others. While certainly an inconvenience for end users and a major headache for the sites in question, what I’m wondering is how many of those Amazon EC2 servers were running internal processes for companies that had moved some or all of their services to the cloud? Risk managers need to keep these kinds of incidents in mind when considering cloud providers.

One of the services many in the enterprise take for granted is data accessibility. If Netflix or Reddit goes down for an afternoon, it’s unlikely that your business productivity will be affected. But what happens if you’ve moved your file server or SharePoint infrastructure to the cloud? We often think of data services as a technology asset, so relocating them to the cloud is mostly a matter of managing costs and SLAs. But the data itself isn’t a technology asset—it’s a business or organizational one. The data doesn’t belong to IT, it belongs to the users who leverage it for (hopefully) revenue-generating activity.

There are certainly going to be use cases where it makes sense to move data services into the cloud, but as service providers we need to keep in mind exactly what tradeoffs exist and what we give up by renting someone’s infrastructure.

A Brief History of US Data Privacy

While IP networks are new relative to the age of our nation, the concept of privacy isn’t. Consider the Colonial-era postal service – mail carriers were required to swear an oath to keep letters sealed. Today we have somewhat of a web-oath—our privacy policies—which are also mostly based on the honor system.

Even with the postman’s oath in place, the colonial mail system still had enormous security holes. The local postal official often stored the letters in his house or other properties. And by the way, it was not unusual for a postmaster to also run (long pause) a tavern. Matters finally improved when a young sys admin named Benjamin Franklin took over this, ahem, early packet network.

First Franklin ordered that the post office could not be located in a private house. Then he secured the actual packets: local postmasters had to seal letters in sacks, which were to be unsealed only when they reached their final destination. A primitive but functional detective control, and subsequent deterrent.

To bring the history of privacy and communications technology closer to our era, consider the privacy involved in the early telegraph network. By 1848 the “Victorian Internet” had grown to over two thousand miles of line, mostly in the northeastern US; and just prior to the Civil War, the network extended almost sixty thousand miles.

Like the Internet, these early telegraph service providers began to collect all kinds of metadata for accounting purposes, which meant keeping copies of the telegrams. Unlike the postal system, a telegram required a far deeper trust in a third-party–one that could theoretically steal and abuse their confidential data. Consumers, though, were willing to trade the more secure postal system for the speedier high-tech telegram.

Sound familiar?

While there were no federal statutes protecting the privacy of telegrams, under pressure from nervous consumers, telegraph operators had strong business incentives to keep their records confidential. Still, there were limits to this privacy protection.  For instance, operators still had to release telegrams when a court order was issued.

Today in the EU, the right to privacy has a far stronger legal footing (see my last post) than here in the US. However, Congress is considering tightening up consumer privacy rights with the Kerry-McCain sponsored—note the language here—“Commercial Privacy Bill of Rights”.

The Kerry-McCain privacy law—the broadest Federal protection for online privacy to date—is very explicit about what makes up personally identifiable information, or PII. If passed in its current form it will have important security and IT implications for US businesses. I’ll cover both the recently published FTC privacy guidelines and the Privacy Bill of Rights in my next post.

In evaluating these new proposed rules, just remember this isn’t the first time in our history that privacy rules and regulations have been worked out for a complex public communications system.

New Zealand’s Leaky Servers Highlight the Need for Information Governance

MSD kiosk network locationsHow a Permissions Report Could Have Plugged the Hole in New Zealand’s Leaky Servers

Earlier this week, Keith Ng blogged about a massive security hole in the New Zealand Ministry of Social Development’s (MSD) network.  He was able to walk up to a public kiosk in the Work and Income office and—without cracking a password or planting a Trojan—immediately gain access to thousands upon thousands of  sensitive files.

How sensitive, you ask?  Among other things, Ng could browse, read, and modify:

  • Invoices and other financial data
  • Call system logs
  • Files linking children to medical prescriptions
  • Identities of children in special needs programs

Really…frightening.

How did this happen?

Well, there are two possibilities:

1. The kiosks were logged in with an administrative account (e.g., Domain Admin) with full access to all data on the network

2. The kiosks were logged in with a “normal” account, but the file shares were incorrectly permissioned, allowing global access

I find it very hard to believe that the kiosks were logged in as administrators, but we can’t rule it out.  The latter cause, broken/excessive permissions, is actually a very common problem that we address with organizations literally every week at Varonis.

What could have been done to prevent it?

Unplugging the kiosks is only step 1.  The kiosks aren’t the issue.  There are much bigger information governance problems at the heart of this data leak.

Here are some tips that will help address the root cause, not just the catalyst:

1. Locate exposed, sensitive data

  • Use a data classification framework to scan your file servers and determine where your most sensitive content lives, and where it is exposed to too many people

Once you’ve located the sensitive stuff, make sure only the right people have access, and then monitor activity on that sensitive data to make sure that authorized users aren’t abusing their access.

If I’m a CSO, I want a solution that tells me at any given time exactly where all my sensitive data is, where it is over-exposed, and who is accessing it.  If someone creates a file with a social security number or patient ID and plops it onto a public share that a kiosk can see, I want my team to be alerted automatically.

2. Identify and remove global access groups from ACLs

  • Figure out where “Everyone” or “Authenticated Users” appears on ACLs and remove them

This can be tough because a.) it’s not trivial to crawl every ACL on every file server or NAS device looking for “Everyone” and b.) you have to pull global access without cutting off people who really need the data.

3. Watch your super users

  • Setup alerts for whenever someone is granted super user/administrator privileges
  • Periodically review the list of people who have privileged access
  • Review your audit trail to see what super users are doing with their elevated rights

Even if the kiosks were mistakenly setup to run under a super user account, if MSD were reviewing access activity they likely would have noticed an inordinate amount of super user activity from the public kiosks’ IP addresses.

4. Assign and involve data owners

  • Access to children’s medical records, for instance, should be granted and reviewed not by IT, but by the business unit that is responsible for managing patients (e.g., a medical director).

By transferring this responsibility to the people who are most equipped to make access control decisions (i.e. data owners), not only do you end up with better decisions, but you also relieve some of the burden on IT.

How hard can it be?

Many of the comments on Ng’s posts were along the lines of “Rookie mistake!” or “Security 101!” I assure you, information governance is much harder than people think, especially in an age where data is somewhat of a contagion, being created and replicated at such a staggering pace.

To these commenters, I’d like to propose a simple question: without an automated solution, how would MSD’s IT department know which folders were mistakenly open to everyone?

It takes one frustrated person 30 seconds to add “Everyone” to an ACL, but it could take years to find and correct that access control failure.  Worse yet, once found, how do you know whether the over-exposed data was stolen by someone who isn’t as harmless as Keith Ng?

That’s the question New Zealand’s government is facing right now.


What is the state of your data protection?

If you’d like a free data security assessment courtesy of Varonis, please let us know.

Twitter Feed


    Follow @Varonis on Twitter