Archive for: October, 2012

FTC Decides Facial Images Should Be Secured

I was ready to start writing about the FTC’s recent recommendations on personally identifiable data (or PIIs), when the agency suddenly lobbed a new guideline onto the scene. Released last Monday, Facing Facts: Best Practices for Common Uses of Facial Recognition Technologies is focused on the risks involved in not securing photographic images and data. It’s another indication that US regulatory rules will be leaning toward a broader definition of what it means to relate “anonymous” data back to an individual.

“Facing Facts” doesn’t read like a standard-issue government agency report. It opens by referencing the Speilberg movie, Minority Report, and its vision of a future where ads are served up based on scans of biometric data.

That world is not quite here yet, but the FTC’s larger point with these new guidelines–read as best practices–is that facial recognition technology has become quite sophisticated and potentially disruptive.

Not only is it possible to use non-proprietary software and hardware to pull key information out of digital images, it’s already being done on a commercial basis. Retailers now install digital signage in mall kiosks to serve up ads based on the gender, age, and other demographic information of the consumers viewing the informational screens.

Even more impressive is that existing facial recognition technology has reached a high-level of accuracy in comparing photos and finding matches. The National Institute of Standards and Technology (NIST) reports that the false reject rate–percentage of comparisons incorrectly rejected–has been cut in half every two years. At the NIST’s Face Recognition Grand Challenge in 2010, the winning company achieved a false reject rate of 2.1% while still delivering a false acceptance rate of .1%.

The FTC’s overall message for companies capturing facial data is three-fold:

  1. Build privacy into products “by design”
  2. Give consumers choice when capturing this information
  3. Be transparent about what’s being done with the images

There are obvious and direct implications for companies in consumer retail and social networking (Facebook typically sees over 1 billion photos monthly). But the scope of this FTC announcement is larger than one might at first guess.

Once upon a time, a file directory of employee digital photos—say for badges–would not have been considered worthy of much security—it’s technically not even considered PII.  This latest FTC announcement, however, indicates a far different view of what it means to trace ostensibly anonymous data back to an individual.

One of the keys points in the FTC’s argument –one I noted in a previous post—is that easy-to-access publicly available data on social networks changes the rules. In a Carnegie Mellon study cited by the FTC, researchers were able to match a set of unidentified photos against existing tagged Facebook photos.

Here’s what the FTC had to say on how companies should now treat digital facial data:

“First, companies should maintain reasonable data security protections for consumer’s images and the biometric information collected from those images to enable facial recognition (for example, unique measurements such as size of features or distance between the eyes or the ears). As the increasing public availability of identified images online … companies that store such images should consider putting protections in place.”

Do your company’s digital photo images—headshots, events, publicity, etc.—have the appropriate access rights? Do you even know where they are?

While you’re pondering that, I’ll be giving Minority Report a second viewing.

 

On Employee Data Theft

Employee Data TheftLast week Zynga, the social gaming company famous for Farmville and Cityville, filed a lawsuit against former employee Alan Patmore for making off with 763 documents—including business plans and other intellectual property—and  bringing them to competitor Kixeye.  Patmore doesn’t deny the claim.

It hasn’t been confirmed exactly how Zynga discovered that Patmore nabbed the documents, but I wonder if software, not a human, sounded the alarm.

Sadly, this kind of unethical behavior happens more frequently than you’d think.  According to Cyber-Ark’s 2012 global Trust, Security and Passwords Survey, slightly less than half of respondents admitted that if they were fired today, they would pocket proprietary data – even knowing it wasn’t allowed.

Other findings from the survey:

  • 45% said they have access to information that is not relevant to their role
  • 42% indicated they have used admin credentials to access information that was marked confidential
  • 55% believe competitors have obtained their company’s  intellectual property

The Zynga case underscores organizations’ need to ensure that only the right users have access to the right data at all times, access is monitored, and abuse is flagged.

For every person who is caught stealing intellectual property from an employer, how many fly under the radar?  Insider threats are something organizations need to take seriously.

Want to find out if suspicious behavior is occurring in your environment?  We’ll show you.

 

What Happens When There’s No Cloud?

Old Man Yells at CloudEarlier today Amazon’s Elastic Compute Cloud experienced a significant outage, bringing down some major websites, notably GitHub, Reddit, and Imgur, among others. While certainly an inconvenience for end users and a major headache for the sites in question, what I’m wondering is how many of those Amazon EC2 servers were running internal processes for companies that had moved some or all of their services to the cloud? Risk managers need to keep these kinds of incidents in mind when considering cloud providers.

One of the services many in the enterprise take for granted is data accessibility. If Netflix or Reddit goes down for an afternoon, it’s unlikely that your business productivity will be affected. But what happens if you’ve moved your file server or SharePoint infrastructure to the cloud? We often think of data services as a technology asset, so relocating them to the cloud is mostly a matter of managing costs and SLAs. But the data itself isn’t a technology asset—it’s a business or organizational one. The data doesn’t belong to IT, it belongs to the users who leverage it for (hopefully) revenue-generating activity.

There are certainly going to be use cases where it makes sense to move data services into the cloud, but as service providers we need to keep in mind exactly what tradeoffs exist and what we give up by renting someone’s infrastructure.

A Brief History of US Data Privacy

While IP networks are new relative to the age of our nation, the concept of privacy isn’t. Consider the Colonial-era postal service – mail carriers were required to swear an oath to keep letters sealed. Today we have somewhat of a web-oath—our privacy policies—which are also mostly based on the honor system.

Even with the postman’s oath in place, the colonial mail system still had enormous security holes. The local postal official often stored the letters in his house or other properties. And by the way, it was not unusual for a postmaster to also run (long pause) a tavern. Matters finally improved when a young sys admin named Benjamin Franklin took over this, ahem, early packet network.

First Franklin ordered that the post office could not be located in a private house. Then he secured the actual packets: local postmasters had to seal letters in sacks, which were to be unsealed only when they reached their final destination. A primitive but functional detective control, and subsequent deterrent.

To bring the history of privacy and communications technology closer to our era, consider the privacy involved in the early telegraph network. By 1848 the “Victorian Internet” had grown to over two thousand miles of line, mostly in the northeastern US; and just prior to the Civil War, the network extended almost sixty thousand miles.

Like the Internet, these early telegraph service providers began to collect all kinds of metadata for accounting purposes, which meant keeping copies of the telegrams. Unlike the postal system, a telegram required a far deeper trust in a third-party–one that could theoretically steal and abuse their confidential data. Consumers, though, were willing to trade the more secure postal system for the speedier high-tech telegram.

Sound familiar?

While there were no federal statutes protecting the privacy of telegrams, under pressure from nervous consumers, telegraph operators had strong business incentives to keep their records confidential. Still, there were limits to this privacy protection.  For instance, operators still had to release telegrams when a court order was issued.

Today in the EU, the right to privacy has a far stronger legal footing (see my last post) than here in the US. However, Congress is considering tightening up consumer privacy rights with the Kerry-McCain sponsored—note the language here—“Commercial Privacy Bill of Rights”.

The Kerry-McCain privacy law—the broadest Federal protection for online privacy to date—is very explicit about what makes up personally identifiable information, or PII. If passed in its current form it will have important security and IT implications for US businesses. I’ll cover both the recently published FTC privacy guidelines and the Privacy Bill of Rights in my next post.

In evaluating these new proposed rules, just remember this isn’t the first time in our history that privacy rules and regulations have been worked out for a complex public communications system.

New Zealand’s Leaky Servers Highlight the Need for Information Governance

MSD kiosk network locationsHow a Permissions Report Could Have Plugged the Hole in New Zealand’s Leaky Servers

Earlier this week, Keith Ng blogged about a massive security hole in the New Zealand Ministry of Social Development’s (MSD) network.  He was able to walk up to a public kiosk in the Work and Income office and—without cracking a password or planting a Trojan—immediately gain access to thousands upon thousands of  sensitive files.

How sensitive, you ask?  Among other things, Ng could browse, read, and modify:

  • Invoices and other financial data
  • Call system logs
  • Files linking children to medical prescriptions
  • Identities of children in special needs programs

Really…frightening.

How did this happen?

Well, there are two possibilities:

1. The kiosks were logged in with an administrative account (e.g., Domain Admin) with full access to all data on the network

2. The kiosks were logged in with a “normal” account, but the file shares were incorrectly permissioned, allowing global access

I find it very hard to believe that the kiosks were logged in as administrators, but we can’t rule it out.  The latter cause, broken/excessive permissions, is actually a very common problem that we address with organizations literally every week at Varonis.

What could have been done to prevent it?

Unplugging the kiosks is only step 1.  The kiosks aren’t the issue.  There are much bigger information governance problems at the heart of this data leak.

Here are some tips that will help address the root cause, not just the catalyst:

1. Locate exposed, sensitive data

  • Use a data classification framework to scan your file servers and determine where your most sensitive content lives, and where it is exposed to too many people

Once you’ve located the sensitive stuff, make sure only the right people have access, and then monitor activity on that sensitive data to make sure that authorized users aren’t abusing their access.

If I’m a CSO, I want a solution that tells me at any given time exactly where all my sensitive data is, where it is over-exposed, and who is accessing it.  If someone creates a file with a social security number or patient ID and plops it onto a public share that a kiosk can see, I want my team to be alerted automatically.

2. Identify and remove global access groups from ACLs

  • Figure out where “Everyone” or “Authenticated Users” appears on ACLs and remove them

This can be tough because a.) it’s not trivial to crawl every ACL on every file server or NAS device looking for “Everyone” and b.) you have to pull global access without cutting off people who really need the data.

3. Watch your super users

  • Setup alerts for whenever someone is granted super user/administrator privileges
  • Periodically review the list of people who have privileged access
  • Review your audit trail to see what super users are doing with their elevated rights

Even if the kiosks were mistakenly setup to run under a super user account, if MSD were reviewing access activity they likely would have noticed an inordinate amount of super user activity from the public kiosks’ IP addresses.

4. Assign and involve data owners

  • Access to children’s medical records, for instance, should be granted and reviewed not by IT, but by the business unit that is responsible for managing patients (e.g., a medical director).

By transferring this responsibility to the people who are most equipped to make access control decisions (i.e. data owners), not only do you end up with better decisions, but you also relieve some of the burden on IT.

How hard can it be?

Many of the comments on Ng’s posts were along the lines of “Rookie mistake!” or “Security 101!” I assure you, information governance is much harder than people think, especially in an age where data is somewhat of a contagion, being created and replicated at such a staggering pace.

To these commenters, I’d like to propose a simple question: without an automated solution, how would MSD’s IT department know which folders were mistakenly open to everyone?

It takes one frustrated person 30 seconds to add “Everyone” to an ACL, but it could take years to find and correct that access control failure.  Worse yet, once found, how do you know whether the over-exposed data was stolen by someone who isn’t as harmless as Keith Ng?

That’s the question New Zealand’s government is facing right now.


What is the state of your data protection?

If you’d like a free data security assessment courtesy of Varonis, please let us know.

The New Privacy Environment: European Union Leads the Way on Personal Data Protection

We all understand the risks in accidentally revealing a social security number. But are there other pieces of less identifying or even anonymous information that taken together act like a social security number? The European Union is breaking new ground on consumer privacy as it begins to reform its own regulations. The EU’s broader ideas on personal identity have even made their way across the pond into proposed new US regulations.

The history of the European Union’s consumer privacy and data security regulations begins with its 1995 Data Protection Directive–or EU 96/46EC for security wonks. EU directives provide guidance to its member nations’ legislatures, who then are free to craft their own specific laws. The DPD has been influential in shaping the vocabulary and, less charitably, the jargon of the consumer privacy discussion on both sides of the Atlantic.

In the US, the starting point for discussion on data security is Sarbanes-Oxley, which became law in 2002. In comparing and contrasting the two, it’s fair to say the DPD was more focused on securing consumer information, but more inclusive—unlike SOX–in covering both public and private companies. To this day in the US there’s currently no single comprehensive law on consumer privacy.

The EU’s original directive is significant because it defined personal data as “information relating to an identified or identifiable natural person”. For example, by EU rules, street address, name, and phone number are personal data; height, eye color, and model of car you drive are not. This notion of personal data as a type of key is part of the definition used in privacy laws outside the EU–including the US. In North America, though, we’ve come up with our own term for personal data, calling it instead “personally identifiable information” or PII.

By the way, the EU regulators intentionally created a less explicit definition of personal data so that it would encompass new technologies. In 2012, data related to an identifiable person could now be an email address, IP address, and for some EU nations, even a photo image.

To bring the story up to date, security experts began to realize that along with personal data there was other data–let’s call it quasi-personal–that if released could also be used to relate back to an individual. The data magic to accomplish identification typically requires matching a collection of anonymous data points– birth dates (or years), zip codes, ethnicity, and perhaps car model driven–against publicly available databases .

For example, there are well documented cases involving anonymized hospital discharge records subsequently used to re-identify the original patients!

With Facebook now up to 1 billion active users, it’s fair to say that the Web is overflowing with personal data at all levels of detail. Essentially social networks have provided hackers—the new ominous player on the scene—with a huge public repository to match against (c.f. Matt Honan).

To get a better understanding of how it’s possible to re-identify an individual, let’s review a variation on the aforementioned case. While the technique is not always guaranteed to uniquely identify a person (this depends on the available related information), it can often produce a narrowed down list of highly likely subjects.

Suppose, for argument’s sake, a European mortgage company analyzes a health report from a large public hospital. The records show that five individuals were being treated for a rare disease. Their ages were also published. Assuming the patients live near the hospital, the mortgage lender then simply filters its database on zip code and birth year. Working with a smaller set of records, it then scans social media sites or other online forums, filtering on the retrieved names and other data, all the while looking, for say, “get well” messages. If it finds a few matches, and with the additional new data points from the social site … I think you see where this is leading.

The good news is that the EU countries have long recognized that their laws have not kept pace. And the EU governing body is currently in the process of reforming the 1995 directive, taking into account the new realities of public data on the Web and the blurring of personal and anonymous data. To get a sense of the EU’s new thinking on personal data, refer to this work-in-progress paper.

And there are also rumblings of change in the US along the same lines as the EU reforms.

I’ll be writing more about US laws and what this will all mean for your company’s data protection policies in future posts.

Image credit: Rama

Network Computing’s Review of DatAdvantage for Exchange

For many organizations, Microsoft Exchange is their central nervous system – it cannot crash and cannot be compromised or business will come to screeching halt.  Varonis is honored that so many customers look to DatAdvantage for Exchange to help manage and protect their critical infrastructure, knowing it will give them the visibility they need without getting in the way.

If you aren’t familiar with DatAdvantage for Exchange, or want an unbiased opinion, take a look at Network Computing’s excellent product review.

And if you’re interested in seeing a live demo or evaluating the software for free in your own environment, just let us know!

Read the Review

Visibility is a Prerequisite to Security

Last night I watched a fantastic episode of This Week in Startups with guest Aaron Levie of Box.net. Aaron is a remarkable young CEO who really seems to understand and care about enterprise software, which is a rare combination.

One of the themes of the interview was that CIOs and IT departments at large organizations are starting to embrace the cloud, bottom up and top down.

He also noted that Fortune 500 companies are starting to develop or acquire cloud solutions to put in their portfolios (e.g., Oracle buys RightNow, SAP buys SuccessFactors). Cloud, cloud, cloud.

Then, about 23 minutes into the interview, the elephant in the room rears its giant head: but what about security?

Even the most progressive enterprises, Aaron remarks, have the philosophy: use any device you want (Mac, PC, iPhones), use any software you want, but secure the data at all costs.

They’d be foolish not to. We’re not talking about MP3s and funny cat photos. We’re talking about intellectual property, source code, patents, legal and HR documents, etc.

As the definition of “secure” evolves, every IT organization is faced with hard decisions.

We have to have a security model that fits today’s distributed work model. What is the correct balance between security and efficiency? Between availability and lockdown?

Levie goes on:

We have to redefine what it means to be secure and what it means to manage security. Then you move more into this category where visibility is security. If I have far more visibility into where my data is, who’s using it, every access, every event on it — maybe it’s a little more open, but people will use the product and I’ll actually see what’s going on with the data.

Visibility–by itself–is not security. If I put my company’s financial statements and intellectual property in the cloud and have complete visibility into who is accessing that data, all that would guarantee is that I could watch people steal it.  Imagine if banks just installed security cameras and declared themselves secure. Assuming the security tapes and audit trails are actually being used to catch and address undesirable activity, auditors call these “detective controls.”

More accurately, visibility is a prerequisite to security.  Detective controls are a critical piece to the puzzle – they allow you to intelligently configure “preventive controls” and make sure they are working as intended. The combination of detective and preventive controls help you prevent catastrophes and detect what you’ve failed to prevent altogether.

Truly secure organizations build policies, procedures, and controls on top of visibility, including entitlement reviews, data loss prevention, content classification, defensible disposition, eDiscovery, disaster recovery, and many more.  These systems for preventing and detecting security problems have taken years to mature.

This blueprint will be required to reach a similar level of maturity in the cloud.

If you’re getting ready to move business data into the cloud, consider asking these questions:

  • Even if one cloud provider gives you complete visibility into accessibility and usage, how do you integrate it with your existing infrastructure? Other cloud data?
  • Visibility into access controls and usage aren’t the only control you need, either. Content inspection, business continuity and disaster recovery, retention policies, authorization processes—all these need to be addressed.
  • IT knows how hard it is to standardize these controls on infrastructure that’s been around for years and is completely under their control— cloud vendors are still figuring this out, their control capabilities vary greatly, and each vendors interface may be well be different.
  • Organizations need to set a minimum standard of controls for every platform, cloud or on-premise.  Access control visibility and automatable execution of changes, complete auditing that can integrate with other technologies, content inspection, automated archiving, etc.

Addressing these data management concerns on your own terms is difficult enough; convincing Google, Amazon, and Box to play by your rules when they have their own agendas and, not to mention, thousands of other customers to satisfy—well, that’s a whole new ball game.

The promise of the cloud is really exciting, but there’s a long way to go after visibility.

Image credit: praweena

Varonis Data Governance Suite 5.8: Faster, Leaner, Lower Cost

We’re extremely excited to announce the release of version 5.8 of the Varonis Data Governance Suite!

This release is packed with major architectural changes that not only increase performance, but also reduce your total cost of ownership and make managing your Varonis infrastructure faster and easier than ever.

What’s new?

Here are some of the key features in 5.8:

  • Collectors: New component introduced for metadata collection that no longer requires Microsoft SQL, resulting in better performance, easier deployment, and a lower Total Cost of Ownership (TCO).
  • Management Console and Scheduler: Single point to manage and control the entire Varonis infrastructure, simplifying installation and monitoring.
  • Incremental File Walk: Ability to incrementally scan/walk only the changed permissions on the file system rather than the entire file system, reducing system and network overhead and boosting overall efficiency.
  • Database Separation: Support for SQL farms external to Varonis components, increasing architecture flexibility and reducing total cost of ownership.
  • Auditing Actions: full audit of activities within DatAdvantage increases organizational security posture by providing immediate accountability for administrators.
  • User and Group Creation: Users and groups can be created and edited from the DatAdvantage interface, increasing administrative functionality and flexibility.

Our CEO and co-founder, Yaki Faitelson:

“We have changed the architecture of the product so that the people who already rely heavily on DatAdvantage to improve management and security for their unstructured data platforms can integrate it into their workflow even more seamlessly, while those new to the technology will benefit from the experience and input from those who have come before them.”

If you’re interested in seeing a demo or evaluating Varonis in your environment, contact us today.

Some Amazing Things About Your File System

I was recently asked by one of our sales people to come up with a few unusual facts about user behaviors or statistics related to networked file systems. She was looking for a good anecdote that would make our customers reconsider conventional IT wisdom. I think I’ve found something to raise an IT admin’s eyebrow.

To be fair, my discovery has been known about in a general way for a long time. It’s even become part of our popular culture. No, I don’t mean Murphy’s Law, which is well-appreciated by IT journeymen. I am referring to the proverbial 80-20 rule, which was explained to me, with more than a little hand waving, when I first started in IT. It went something like this: “80% of the data is explained by 20% of the facts”.

As with many simply stated rules, 80-20 hides some deep ideas. It turns out to describe key stats in complex systems spanning economics, marketing, sociology, as well as a few physical sciences. In recent years, the rule has been found to apply to another and more familiar complex creation–the Internet.

Long tail lemurThe fancier way to describe the 80-20 rule is to say that the distribution of data—a graph of web site visits, web link references, and, as we’ll see later, file sizes—are governed by so-called power laws. Long tails or fat tails are still other terms used to talk about the relative weightiness of events at the extreme end of the data curve—that is, compared to the thinner limits of the more beloved bell-shaped curve.

There is strong evidence for the rule. Much has been written about fat tails with respect to web stats. You can partially satisfy your own curiosity by looking at the web traffic data collected by Quantcast. According to them, perennial top sites such as Facebook, Google, Yahoo, Twitter, MSN.com and a few others attract a disproportionate amount of total web visits.

From a quick back-of-the-envelope calculation using the Quantcast numbers, I tallied up close to 80% of monthly visitor traffic against just 40 of Quantcast’s top ranked sites. These 40 sites, out of almost 400 million total web sites worldwide, is way, way less than 1%. That’s a very skewed 80-20 pattern—closer to 80-.00001!

What does this have to do with file systems? Networked file servers are complex enough with a large enough community of users accessing an ever changing supply of resources–files, directories, and access permissions—to potentially behave in similar ways to the Web.

In graphing the distributions of file sizes, researchers long ago noticed–long pause–a similar kind of skewed curve. While it may not be a true power law, the telltale fat tail shows up for extreme file sizes. For example, you can check out this paper from the folks at Microsoft Research wherein they plot byte-counts for their corporate file system.

Being curious about my own aged home computer, a 10 year-old Dell running Windows XP, I decided to take a quick peek at a histogram of its file system, using a freebie utility. Here’s what I learned: out of almost 70,000 files taking up about 29 GB of space, a mere 83 files, or a shade more than .1%, accounted for an astonishing 26% of the disk space!

Skewed disk utilization graph

Even though I’m familiar with the research, I was still a little stunned to see the fat tail pattern play out on my personal computer. By the way, Microsoft Outlook® .pst files can reach huge sizes–you’ve been warned!

What’s going on to explain these renegade fat tails in corporate file systems?

One of the proposed ideas is that we, as file users, are copying existing files and then editing–adding or subtracting content–from them for the next person down the chain to modify and so on. Essentially, users are successively multiplying a file size by a random factor, and this has been shown to lead to fat-tailed file size curves.

This copying behavior may also have a herd component to it. That is, we tend to edit files that have been copied or accessed more frequently. Preferences for popular files—or web sites or social networks—are also known to lead to fat-tailed distributions.

Based on my own experience as a user, I plead guilty to not only amending and expanding existing files but also echoing file permissions. When it came to read-write-execute or ACE metadata, I was definitely a member of the herd, following what someone else had done—that is, until I started at Varonis.

There’s an IT moral to all this. Your user community is, unfortunately, propagating the “everyone” group or other harmful ACEs, and also unknowingly helping to push files into the red-zone of the file size curve.

For my money, herding behaviors alone are reason enough to use Varonis’s DatAdvantage to really understand and manage your organization’s networked file systems. A file system and its community of users form a kind of social network in which it is quite easy to amplify bad habits.

So you’ll want Varonis’s software to automatically spot these patterns and then take more direct control over shaping your file system’s overall profile.

Image credit: Schnobby