Varonis debuts trailblazing features for securing Salesforce. Learn More

Varonis announces strategic partnership with Microsoft to acclerate the secure adoption of Copilot.

Learn more

5 Things You Should Know About Big Data

2 min read
Last updated January 17, 2023

Giant T Rex Big data is a very hot topic, and with the Splunk IPO last week seeing a 1999-style spike, the bandwagon is overflowing.  We’re poised to see many businesses pivoting into the big data space or simply slapping a big data sticker on their products—accurate or not—just to ride the wave.

This post aims to help educate you with a few byte-sized big data concepts (not just trivia) so that you can distinguish the substance from the hype.

Hate computers professionally? Try Cards Against IT.

 

1. Big data is distributed data

Big data is a nebulous term with many different definitions.  The key thing to remember is that in this day and age, big data is distributed data.  This means the data is so massive it cannot be stored or processed by a single node.

The days of buying a single big iron server from IBM or Sun to handle all your business intelligence needs are long gone.  It’s been proven by Google, Amazon, Facebook, and others that the way to scale fast and affordably is to use commodity hardware to distribute the storage and processing of our massive data streams across several nodes, adding and removing nodes as needed.

2. You’re going to hear the words “Hadoop” and “MapReduce”

What is Hadoop?   It is an open source platform for consolidating, combining and understanding large-scale data in order to make better business decisions. Hadoop is the technology powering many (but not all) big data analytics infrastructures.

There are 2 key parts to Hadoop:

  • HDFS (Hadoop distributed file system) which lets you store data across multiple nodes.
  • MapReduce which lets you process data in parallel across multiple nodes.

Although Hadoop is one of the most popular solutions for crunching big data — there are plenty others.  Big data can’t be shoehorned into one flavor of technology.  The important characteristic is that you’re able to draw insights from large quantities of data, independent of specific technologies.

3. You can understand MapReduce without a degree from Stanford

The best plain English explanation of MapReduce I’ve encountered (paraphrasing):

We want to count all the books in the library.  You count up shelf #1.  I count up shelf #2.  That’s map. Now we get together and add our individual counts.  That’s reduce.

For a deeper understanding, Wikipedia is a good place to start.

4. Distributed data generation is fueling big data growth

The reason we have data problems so big that we need large-scale distributed computing architecture to solve is that the creation of the data is also large-scale and distributed.  Most of us walk around carrying devices that are constantly pulsing all sorts of data into the cloud and beyond – our locations, our photos, our tweets, our status updates, our connections, even our heartbeats.

For every human-generated piece of data there’s likely associated machine-generated data.  And then there’s the metadata.  The data is abundant and it’s extremely valuable.

5. Machine learning is…awesome!

One of the key differentiators in big data analytics are the machine learning algorithms used to answer interesting questions and derive value from the 0s and 1s we’re furiously chewing up and spitting back out.

Some pretty cool examples:

  • Nest – a beautiful thermostat that learns how hot or cold you like your house so you never have to adjust it again (not technically big data, but fun nonetheless)
  • Gmail’s Bayesian spam filter – no more tempting emails from that pesky Nigerian prince!
  • Amazon’s product recommendations – sure, I’ll take a JavaScript book, a pair of Asics, and season 1 of Game of Thrones.  How do they know me so well?!
  • Varonis’ access control recommendations – ratchet down access based on highly accurate analytics.

If you’re interested in learning more about big data, join our webinar this Wednesday on Mastering Big Data.

photo credit: http://fav.me/d4vqn4w

What you should do now

Below are three ways we can help you begin your journey to reducing data risk at your company:

  1. Schedule a demo session with us, where we can show you around, answer your questions, and help you see if Varonis is right for you.
  2. Download our free report and learn the risks associated with SaaS data exposure.
  3. Share this blog post with someone you know who'd enjoy reading it. Share it with them via email, LinkedIn, Reddit, or Facebook.
Try Varonis free.
Get a detailed data risk report based on your company’s data.
Deploys in minutes.
Keep reading
6-prompts-you-don't-want-employees-putting-in-copilot
6 Prompts You Don't Want Employees Putting in Copilot
Discover what simple prompts could expose your company’s sensitive data in Microsoft Copilot.
generative-ai-security:-preparing-for-salesforce-einstein-copilot
Generative AI Security: Preparing for Salesforce Einstein Copilot
See how Salesforce Einstein Copilot’s security model works and the risks you must mitigate to ensure a safe and secure rollout.
dspm-buyer's-guide
DSPM Buyer's Guide
Understand the different types of DSPM solutions, avoid common pitfalls, and ask questions to ensure you purchase a data security solution that meets your unique requirements.
speed-data:-preparing-for-the-unknown-in-cybersecurity-with-ian-hill
Speed Data: Preparing for the Unknown in Cybersecurity With Ian Hill
Ian Hill, the Director of Information and Cybersecurity for Upp Telecommunications, offers his take on AI and the future of tech, shares his tricks for a good cyber defense, and explains why the best-laid plans of mice and security professionals often go astray.