Archive for: April, 2012

5 Things You Should Know About Big Data

Giant T Rex Big data is a very hot topic, and with the Splunk IPO last week seeing a 1999-style spike, the bandwagon is overflowing.  We’re poised to see many businesses pivoting into the big data space or simply slapping a big data sticker on their products—accurate or not—just to ride the wave.

This post aims to help educate you with a few byte-sized big data concepts (not just trivia) so that you can distinguish the substance from the hype.

1. Big data is distributed data

Big data is a nebulous term with many different definitions.  The key thing to remember is that in this day and age, big data is distributed data.  This means the data is so massive it cannot be stored or processed by a single node.

The days of buying a single big iron server from IBM or Sun to handle all your business intelligence needs are long gone.  It’s been proven by Google, Amazon, Facebook, and others that the way to scale fast and affordably is to use commodity hardware to distribute the storage and processing of our massive data streams across several nodes, adding and removing nodes as needed.

2. You’re going to hear the words “Hadoop” and “MapReduce”

What is Hadoop?   It is an open source platform for consolidating, combining and understanding large-scale data in order to make better business decisions. Hadoop is the technology powering many (but not all) big data analytics infrastructures.

There are 2 key parts to Hadoop:

  • HDFS (Hadoop distributed file system) which lets you store data across multiple nodes.
  • MapReduce which lets you process data in parallel across multiple nodes.

Although Hadoop is one of the most popular solutions for crunching big data — there are plenty others.  Big data can’t be shoehorned into one flavor of technology.  The important characteristic is that you’re able to draw insights from large quantities of data, independent of specific technologies.

3. You can understand MapReduce without a degree from Stanford

The best plain English explanation of MapReduce I’ve encountered (paraphrasing):

We want to count all the books in the library.  You count up shelf #1.  I count up shelf #2.  That’s map. Now we get together and add our individual counts.  That’s reduce.

For a deeper understanding, Wikipedia is a good place to start.

4. Distributed data generation is fueling big data growth

The reason we have data problems so big that we need large-scale distributed computing architecture to solve is that the creation of the data is also large-scale and distributed.  Most of us walk around carrying devices that are constantly pulsing all sorts of data into the cloud and beyond – our locations, our photos, our tweets, our status updates, our connections, even our heartbeats.

For every human-generated piece of data there’s likely associated machine-generated data.  And then there’s the metadata.  The data is abundant and it’s extremely valuable.

5. Machine learning is…awesome!

One of the key differentiators in big data analytics are the machine learning algorithms used to answer interesting questions and derive value from the 0s and 1s we’re furiously chewing up and spitting back out.

Some pretty cool examples:

  • Nest – a beautiful thermostat that learns how hot or cold you like your house so you never have to adjust it again (not technically big data, but fun nonetheless)
  • Gmail’s Bayesian spam filter – no more tempting emails from that pesky Nigerian prince!
  • Amazon’s product recommendations – sure, I’ll take a JavaScript book, a pair of Asics, and season 1 of Game of Thrones.  How do they know me so well?!
  • Varonis’ access control recommendations – ratchet down access based on highly accurate analytics.

If you’re interested in learning more about big data, join our webinar this Wednesday on Mastering Big Data.

photo credit:

New Case Study: Greenhill & Co.

Greenhill & Co., Inc. is a leading independent investment bank. The company was established in 1996 by Robert F. Greenhill, the former President of Morgan Stanley and former Chairman and Chief Executive Officer of Smith Barney.

As the CIO of Greenhill & Co., Inc., John Shaffer sought a data governance solution that could provide visibility into employee access rights, and identify potential issues. Additionally, the company needed a more efficient way to determine when content was moved or deleted, how it was being used, and by whom.

Greenhill & Co found that trying to manually manage and protect information for a company with global reach was often time-consuming, ineffective and error prone. The CIO and his team needed automated analysis of the organization’s permissions structure to more efficiently determine which files and folders required owners and who those owners were likely to be.  Identifying likely data owners required the ability to analyze actual access activity to identify likely data owners.

Further, Greenhill required a system to manage access and permissions to sensitive data.

“We liked DatAdvantage because it told us right away the access rights that certain folders had, which people had access to those folders, where the content was moving to, and if that access should be tightened.”

Click here to read the whole case study.

Data Governance Made Easier: Version 5.7 is now GA

Version 5.7 of the Varonis® Data Governance Suite® has been officially released. This version includes enhancements for Varonis DatAdvantage® and Varonis DataPrivilege® as well as a brand new product, DatAdvantage for Directory Services® . Almost all new features and enhancements came straight from our customers so we would like to say, thank you!

Some of the new features and enhancements to Varonis DatAdvantage® version 5.7 include:

  • Reports template wizard: customize the content and look of your reports
  • Flags, tags, and notes: create your own metadata on folders, files, groups, and users
  • Easy change reporting for data owners – automatically receive reports on changes to your folders & groups
  • Demarcation Report – report on folders that need owners based on their permissions and place in the hierarchy, and who those owners are likely to be
  • Support for HP IBRIX X9000 NAS Systems

DatAdvantage for Directory Services® provides new capabilities to audit and monitor Active Directory:

  • View domain and domain objects in DatAdvantage GUI
  • Analyze Organizational Unit’s and other AD objects
  • Augment auditing of changes to AD objects

These new functionalities are viewable in the DatAdvantage® GUI, providing a complete picture of your environment from a single interface.

In version 5.7 of Varonis DataPrivilege®, new features include:

  • Create folders from the DataPrivilege interface for easy collaboration
  • Dynamically assign first authorizer for permission and group membership requests (“Authorizer 0”)
  • “Locations” for groups, adding hierarchical organization to large group structures
  • New reports for data owners

Request a demo or request a 30-day free trial of Varonis® Data Governance Suite version 5.7. Customers may contact for assistance with upgrading.


Another Great Trade Robbery

“The Great Trade Robbery” – currently used in the context of questionable international trading policies and lopsided sports team player trades—now has yet another meaning. Two recent articles about Digital Espionage and IP theft by the Chinese Government and Chinese businesses describe a new trade robbery that has apparently been going for some time, and the extreme measures some organizations are taking to protect themselves.

A recent New York Times article discussed how employees now must travel “electronically naked,” meaning leave all electronic devices at home, as just about everything you carry with you digitally—your personal information, your contacts, your login credentials, your company’s Intellectual Property—will get stolen. The article went on to say, “The Chinese are very good at covering their tracks,” stated a former F.B.I. agent. “In most cases, companies don’t realize they’ve been burned until years later when a foreign competitor puts out their very same product — only they’re making it 30 percent cheaper.”

It makes sense that we become a little more circumspect with the information we carry around. Most of us wouldn’t tote our life savings in cash around the block (much less to China) without a very good reason to do so. A single smartphone can now be a gateway into our digital realm (as well as our life savings, because there’s an app for that). A Trojan installed or outright theft can conceivably lead to the theft of your entire digital life-savings and your organization’s valuable data.

A Business Week article, “Hey China, Stop Stealing our Stuff,” provided additional detail about China’s questionable “trading” practices, including sanctioned hacking of foreign entities by the Chinese Government.  The article included a few examples of the impact on the victims – millions of dollars lost, a significant drop in stock price, and a loss of customer confidence.

So we can’t just keep our data at home, apparently. We have to continue to be vigilant even on our “trusted networks.”

China represents a huge market, but these articles illustrate that companies doing business in China or with Chinese interests must begin to think about mitigating new levels of risk, and in some cases take drastic actions like traveling “electronically naked” to minimize potential exposure.

Putting China and extreme security aside for a second, how is your organization doing at some of the more basic data protection tasks? For example:

  • Do you know for certain where all the intellectual property in your organization resides?
  • Do you know who can and does access it?
  • How often is access reviewed?
  • Does the organization allow intellectual property to be accessed or stored on laptops?
  • Does the organization allow intellectual property to be accessed or stored on remote devices, such as smartphones or tablets?

If the answer is “no” to the first two questions, for example, forget about keeping your data secret from China—you may not be able to keep it secret from your kids.

What are you most concerned about when considering your organizations Intellectual Property? Please take our informal poll.

Introduction to OAuth (in Plain English)


We’ve talked about giving away your passwords and how you should never do it.  When a website wants to use the services of another—such as Bitly posting to your Twitter stream—instead of asking you to share your password, they should use OAuth instead.

OAuth is an authentication protocol that allows you to approve one application interacting with another on your behalf without giving away your password.

This is a quick guide to illustrate, as simply as possible, how OAuth works.

The OAuth Flow

There are 3 main players in an OAuth transaction: the user, the consumer, and the service provider.  This triumvirate has been affectionately deemed the OAuth Love Triangle.

In our example, Joe is the user, Bitly is the consumer, and Twitter is the service provided who controls Joe’s secure resource (his Twitter stream).  Joe would like Bitly to be able to post shortened links to his stream.  Here’s how it works:

Step 1 – The User Shows Intent

Joe (User): “Hey, Bitly, I would like you to be able to post links directly to my Twitter stream.”
Bitly (Consumer): “Great! Let me go ask for permission.”

Step 2 – The Consumer Gets Permission

Bitly: “I have a user that would like me to post to his stream. Can I have a request token?”
Twitter (Service Provider): “Sure.  Here’s a token and a secret.”

The secret is used to prevent request forgery.  The consumer uses the secret to sign each request so that the service provider can verify it is actually coming from the consumer application.

Step 3 – The User Is Redirected to the Service Provider

Bitly: “OK, Joe.  I’m sending you over to Twitter so you can approve.  Take this token with you.”
Joe: “OK!”

<Bitly directs Joe to Twitter for authorization>

This is the scary part. If Bitly were super-shady Evil Co. it could pop up a window that looked like Twitter but was really phishing for your username and password.  Always be sure to verify that the URL you’re directed to is actually the service provider (Twitter, in this case).

Step 4 – The User Gives Permission

Joe: “Twitter, I’d like to authorize this request token that Bitly gave me.”
Twitter: “OK, just to be sure, you want to authorize Bitly to do X, Y, and Z with your Twitter account?”
Joe: “Yes!”
Twitter: “OK, you can go back to Bitly and tell them they have permission to use their request token.”

Twitter marks the request token as “good-to-go,” so when the consumer requests access, it will be accepted (so long as it’s signed using their shared secret).

Step 5 – The Consumer Obtains an Access Token

Bitly: “Twitter, can I exchange this request token for an access token?”
Twitter: “Sure.  Here’s your access token and secret.”

Step 6 – The Consumer Accesses the Protected Resource

Bitly: “I’d like to post this link to Joe’s stream.  Here’s my access token!”
Twitter: “Done!”


In our scenario, Joe never had to share his Twitter credentials with Bitly.  He simply delegated access using OAuth in a secure manner.  At any time, Joe can login to Twitter and review the access he has granted and revoke tokens for specific applications without affecting others.  OAuth also allows for granular permission levels.  You can give Bitly the right to post to your Twitter account, but restrict LinkedIn to read-only access.

OAuth Isn’t Perfect…Yet

OAuth is a solid solution for browser based applications and is a huge improvement over HTTP basic authentication.  However, there are limitations, specifically with OAuth 1.0, that make it far less secure and less user-friendly in native and mobile applications.

OAuth 2.0 is a newer, more secure version of the protocol which introduces different “flows” for web, mobile, and desktop applications.  It also has the notion of token expiration (similar to cookie expiration), requires SSL, and reduces the complexity for developers by no longer requiring signing.

Other Resources

Hopefully this was a good primer to get you familiar with OAuth so the next time you see “Sign-in with Twitter” or similar delegated identity verification, you’ll have a good idea of what is going on.

If you want to dive deeper in into the mechanics of OAuth, here are some helpful links: