Data Classification Tips: Finding Credit Card Numbers

4 Useful Regular Expressions and Algorithm Combinations for Finding Credit Card Numbers

Data classification is a critical piece of the data governance puzzle.  In order to be successful at governing data, you have to know—at all times—where your sensitive data is concentrated, unencrypted, and potentially overexposed.

One of the standard ways to find sensitive data is to use Regular Expressions (RegEx) to match patterns. Used by themselves, regular expressions often identify too much—some of the numbers they find are not really credit numbers, even though they match the pattern you’re looking for.  These “false positives” can be reduced by using algorithmic verification, such as Luhn, or IBAN.  If you don’t know what Regular Expressions are, or you are a bit rusty on the syntax, there are some excellent tutorials on the web (start here or here). If you’d like some help validating your results with Luhn, a good article can be found here (The Varonis IDU Classification Framework has algorithmic validation built-in).

What’s considered sensitive?

Well, that really depends on who you’re asking.  Many organizations have idiosyncratic data such as customer or patient IDs, payroll codes, etc. that they want to keep confidential.  But some things are universally considered sensitive – like credit card numbers.

Thus, we figured credit card numbers would be a perfect place to start our RegEx compendium.  Enjoy!

Mastercard

Regular expression:

(?<first>\b(?<![:$.,_'-])5[1-5][0-9]{2}[ -][0-9]{4}[ -][0-9]{4}[ -][0-9]{4}(?![:$.,_'-])\b)|(?<second>\b(?<![:$.,_'-])5((\d)(?!\2{4,})(?!(12345|23456|34567|45678|56789|98765|87654|76543|65432|54321))){15}(?![:$.,_'-])\b)

Suggested Algorithm for Verification: Luhn

AMEX – validate with Luhn

(?<first>\b(?<![:$.,_'-])3[47][0-9]{2}[ -][0-9]{6}[ -][0-9]{5}(?![:$.,_'-])\b)|(?<second>\b(?<![:$.,_'-])3[47]((\d)(?!\2{4,})(?!(12345|23456|34567|45678|56789|98765|87654|76543|65432|54321))){13}(?![:$.,_'-])\b)

Discover – validate with Luhn

(?<first>\b(?<![:$.,_'-])6(?:011|5[0-9]{2})[ -][0-9]{4}[ -][0-9]{4}[ -][0-9]{4}(?![:$.,_'-])\b)|(?<second>\b(?<![:$.,_'-])6((\d)(?!\2{4,})(?!(12345|23456|34567|45678|56789|98765|87654|76543|65432|54321))){15}(?![:$.,_'-])\b)

Visa – validate with Luhn

(?<first>\b(?<![:$.,_'-])4[0-9]{3}[ -][0-9]{4}[ -][0-9]{4}[ -][0-9]{4}\b)|(?<second>\b(?<![:$.,_'-])4((\d)(?!\2{4,})(?!(12345|23456|34567|45678|56789|98765|87654|76543|65432|54321))){15}\b)|(?<third>\b(?<![:$.,_'-])4((\d)(?!\2{4,})(?!(12345|23456|34567|45678|56789|98765|87654|76543|65432|54321))){12}(?![:$.,_'-])\b)

Special thanks to the Varonis Systems Engineering team for their contributions! In future posts, we’ll share tips for finding other sensitive data using regular expressions, algorithmic verification, and other metadata like permissions and access activity.

Photo credit: Shawn Rossi - http://www.flickr.com/photos/shawnzlea/527857787/

Fixing Access Control without Inciting a Riot

In a previous post, Fixing the Open Shares Problem, we talked about some of the challenges we face when trying to remediate access to open shares. One of the main problems is minimizing the impact these clean-up activities can have on the day to day activities of business users.

Think about it: if a global access group is removed from an ACL, the odds are very high that someone who has been using that data will now be restricted.  We find ourselves in a catch 22 between remediating global access and weeks of business disruption as they try and respond to the problems caused by the “fix.”

IT: “I’m sorry that you’re unable access your data.  We’re working on fixing it now.  I assure you, the only reason this happened is because we were trying to make things better.

Business user: “I totally understand.  Thank you!  You should get a raise!”

(We all know this is not how the conversation goes).

There’s a better way.

Varonis DatAdvantage provides the ability to simulate permission changes and see theGlobal Group Access Report probable outcome before you commit those changes to production. How? DatAdvantage correlates every audit event with the permissions on an ACL and then analyzes the impact of each simulated ACL  and/or group change. Through this sandbox, IT can identify the users who would have been affected by that change had it already been made—those users who would have called up the help desk screaming that they couldn’t access data they needed.

Once you’ve verified that those users really need access, you can continue to configure the ACL and group members within DatAdvantage to provide them access, and keep simulating until you’re confident that your permissions changes will not disturb people’s work. If you have the credentials to be able to make changes, DatAdvantage lets you commit all permissions and group changes right through the interface (over all platforms), either immediately or scheduled to hit a change management window later.

These simulation capabilities eliminate the risks of manually cleaning up open shares, since IT is able to fix the problem without ever impacting legitimate use.  Most IT departments have seen the results of trying to solve this problem manually: lots of broken ACLs and annoyed users. It’s a lot of fun to show them a better way.

You can request a free 1:1 demo of the Varonis suite here or watch our next live demo on the web.

Simulating Cleanup in a Sandbox

 

Varonis Data Governance Awards 2012

Varonis Data Governance Awards 2012Varonis is pleased to announce the Varonis Data Governance Awards 2012. The awards are designed to reward the innovation, determination and dedication that our customers apply, every day, to protecting and managing their data with our products. We want to showcase top-class performance and reward achievement, whatever its form.

The awards are free to enter, and are open to all of our customers, regardless of size, location, business type or product deployed. Winning an award will be a sign of excellence, and a distinction that shows that our customers have achieved something to be proud of.

The awards are free to enter for any Varonis customer, and the deadline for entry is July 9th, 2012. More information including details of the awards, how to enter, terms and conditions and FAQ is available at www.varonis.com/awards.

Exchange Journaling and Diagnostics: How to

Journaling and Diagnostics Logging are services to monitor and audit activity on Microsoft Exchange servers. They provide basic auditing functionality for email activity (e.g. who sent which message to whom) and, if collected and analyzed, may help organizations answer basic questions about email, as well as comply with  policies and regulations. (Note: Varonis DatAdvantage for Exchange does not require journaling or diagnostics to monitor Exchange activity.)

Journaling records email communication traffic and processes messages on the Hub Transport servers. The information collected by the journaling agent can be viewed through journaling reports, which include the original message with all the attachments.

Diagnostics writes additional activities to the event log (visible in Windows Event Viewer), such as “message sent as” and “message sent on behalf of” actions. Diagnostics can be configured through the Manage Diagnostics Logging Properties window in the Exchange Management Console.

Journaling and Diagnostics Logging collect significant amounts of events and generate a large amount of raw log data, so it is critical to plan which mailboxes and messages will be monitored and allocate additional storage before enabling.

Here are the steps to enable Journaling and Diagnostics in your Exchange Server.

Setting up Journaling in Exchange

There are two types of Journaling: standard and premium. Standard provides journaling of all the messages sent and received from mailboxes on a specified mailbox database, while premium provides the ability journal individual recipients by using journaling rules.
Setting up Journaling in Exchange
Here are the high-level steps to setup journaling on your Exchange server:

  1. First, create a journaling mailbox. This mailbox will be configured to collect all the journaling reports, and should ideally be setup with no storage limits to avoid missing any. The process to create the mailbox is:
    1. Select a different OU than the default
    2. Assign a display name
    3. Assign user logon name (user will use to login to this mailbox)
    4. Setup a password—take into account that journaling mailboxes may contain sensitive information, as a copy of the message is stored with the report.
  2. To enable standard Journaling it is necessary to modify the properties of the mailbox database. Under the Organization Configuration/Mailbox/Database Management/Maintenance tab, you will need to specify the journaling mailbox where you want the journaling reports sent.
  3. Premium Journaling requires an Exchange Enterprise Client license. To setup premium journaling, it is necessary to create journal rules, which are used to setup journaling for specific recipients. Using the EMC (Exchange Management Console) the journal rules can be created under the Hub Transport section of the Organization Configuration; on the Journal Rules tab. The fields to configure a journal rule are the following:
    1. Name
    2. Send reports to email
    3. Scope
      • Global – all messages through the Hub transport
      • Internal – messages sent and received by users in the organization
      • External – messages sent to or from recipients outside the organization
    4. Journal messages for recipient – journal messages sent to or from a specific recipient
    5. Enable rule – checkbox

Make sure the status on the completion page is “Completed” to verify that the rule was created successfully.

Setting up Diagnostics in Exchange

Diagnostics logging is configured separately for each service on each server. The steps toSetting up Diagnostics in Exchange configure diagnostics logging are:

  1. In the Exchange Management Console (EMC), click on Server Configuration.
  2. Right-click on an Exchange server  to enable Diagnostics Logging on it.
  3. Click on Manage Diagnostics Logging Properties.
  4. On the Manage Diagnostics Logging window, select the services you want to enable diagnostics for.
  5. Choose the level of diagnostics you would like on that service.
    • Lowest – log only critical events
    • Low – log only events with logging level 1 or lower
    • Medium – log events with logging level 3 or lower
    • High – log events with logging level 5 or lower
    • Expert – log events with logging level 7 or lower
  6. Click on configure. The system will provide a confirmation screen.

In a future post, we will go over the Mailbox Audit Logging in MS Exchange 2010.

InfoSecurity 2012 Highlights

Varonis at InfoSec Europe 2012Last week I was fortunate to attend InfoSecurity 2012 in London. The energy level seemed much higher than in previous years, for both attendees and exhibitors.  Just under 13000 people were there, up  24% from last year, and those that stopped by our booth seemed to have a real sense of urgency about data protection, and it seems that related projects are getting a lot of priority right now. Upcoming EU privacy legislation seemed to be on a lot of people’s minds, as well as all the recent breaches in the news.

It’s good news if more organizations are truly starting to notice and pay attention to data protection; conscious attention is a prerequisite to change. As the results of our recent data protection survey show, attention and change are certainly needed.

One of the highlights of the week was Varonis taking home SC Magazine’s Best Network Security Award. It’s gratifying to have such an esteemed group recognize the work we’re doing trying to help manage and protect data.

I’m looking forward to next year’s show, and hoping that increased attention helps improve the state of data security in the meantime.

What Do U.S. Security Legislation and Insurance Companies Have in Common?

Answer:  Both may affect the way businesses determine what constitute appropriate security measures.

In February, Senators Joe Lieberman, Susan Collins, John D. Rockefeller IV, and Dianne Feinstein introduced the Cybersecurity Act of 2012. The intent of the Act is to give the Department of Homeland Security (DHS) additional power to set cyber security standards for private companies that operate the nation’s critical infrastructure. Simply speaking, the intent of the bill is to:

  1. Identify risk via cooperation between DHS and private corporations
  2. Protect critical infrastructure (although what exactly constitutes critical infrastructure is yet to be defined)
  3. Improve information sharing about security issues and events between DHS and private corporations

According to the Homeland Security Website:  “The bill would authorize the Secretary of Homeland Security, together with the private sector, to determine cyber security performance requirements based upon the risk assessments. The performance requirements would cover critical infrastructure systems and assets whose disruption could result in severe degradation of national security, catastrophic economic damage, or the interruption of life-sustaining services sufficient to cause mass casualties or mass evacuations. The bill would only cover the most critical systems and assets in a given sector, and only if they are not already being appropriately secured.”

The website goes on, “Owners of “covered critical infrastructure” would have the flexibility to meet the cybersecurity performance requirements in the manner they deem appropriate. The private sector also would have the opportunity to develop and propose performance requirements for “covered critical infrastructure.”

http://www.hsgac.senate.gov/download/the-cybersecurity-act-of-2012-s-2105_-summary

In this regard, if this bill is passed, companies that operate anything that might be lumped into the category of critical infrastructure (i.e. financial, energy, food, medical, healthcare, etc.) may need to rethink their risk tolerance, security engineering methodologies and security operations practices.  If your company does operate critical infrastructure,  the Department of Homeland Security may soon police your security engineering efforts.

Coincidentally, the Insurance industry is also affecting how Security Admins determine appropriate security measures for their companies.  Cyber insurance was created to protect the interests of companies in the event of a loss due to a variety of different issues including data breaches, cyber-extortion, content liability, penalties for civil actions resulting from failure to comply with a specific regulation, virus liability, cyber terrorism, loss of income due to hacking, DOS attacks, etc. While cyber insurance may be worthwhile, as those of us with homeowners or automobile Insurance know, insurance policies always contain a list of exclusions.

Cyber Insurance is no different.  Notable exclusions can include such vague statements such as:

  • Loss caused by an employee, officer, director, owner, independent contractors
  • Failure to follow minimum required practices
  • Failure to take reasonable security measures

Given that Security Admins are paid to take “reasonable” security measures, it’s hard to imagine how these exclusions will be interpreted in the event of a breach.  Only an attorney can determine the actual impact of these exclusions.  Ultimately, Security Admins are compelled to work with their legal department and other business areas to ensure that their Cyber Insurance policy provides coverage in the event of a breach.   In this regard, insurance companies may influence your security engineering efforts, as well.

In a recent trade show, an attendee told me that his company was forced to purchase Cyber Insurance.  When I asked him why, he indicated that one of his customers required Cyber Insurance as a condition of doing business with them.  This customer understood that a prerequisite to determining which Cyber Insurance policy was appropriate was to involve business data owners who are best prepared to determine the risk associated with their area of interest.  Many companies have recognized the value of including business areas and specifically data owners in security engineering planning.

The Cyber Security Act of 2012 and Cyber Insurance are two motivating factors which will encourage companies to better understand risk and tolerance, and foster cooperation between IT security and data owners.

http://www.iqpc.com/uploadedFiles/EventRedesign/USA/2011/November/20810001/Assets/2011-The-Rapidly-Evolving-Nature-of-Cyber-Risk.pdf

http://www.cpcusociety.org/file_depot/0-10000000/0-10000/3267/conman/CPCUeJournalDec08article.pdf

5 Things You Should Know About Big Data

Giant T Rex Big data is a very hot topic, and with the Splunk IPO last week seeing a 1999-style spike, the bandwagon is overflowing.  We’re poised to see many businesses pivoting into the big data space or simply slapping a big data sticker on their products—accurate or not—just to ride the wave.

This post aims to help educate you with a few byte-sized big data concepts (not just trivia) so that you can distinguish the substance from the hype.

1. Big data is distributed data

Big data is a nebulous term with many different definitions.  The key thing to remember is that in this day and age, big data is distributed data.  This means the data is so massive it cannot be stored or processed by a single node.

The days of buying a single big iron server from IBM or Sun to handle all your business intelligence needs are long gone.  It’s been proven by Google, Amazon, Facebook, and others that the way to scale fast and affordably is to use commodity hardware to distribute the storage and processing of our massive data streams across several nodes, adding and removing nodes as needed.

2. You’re going to hear the words “Hadoop” and “MapReduce”

What is Hadoop?   It is an open source platform for consolidating, combining and understanding large-scale data in order to make better business decisions. Hadoop is the technology powering many (but not all) big data analytics infrastructures.

There are 2 key parts to Hadoop:

  • HDFS (Hadoop distributed file system) which lets you store data across multiple nodes.
  • MapReduce which lets you process data in parallel across multiple nodes.

Although Hadoop is one of the most popular solutions for crunching big data — there are plenty others.  Big data can’t be shoehorned into one flavor of technology.  The important characteristic is that you’re able to draw insights from large quantities of data, independent of specific technologies.

3. You can understand MapReduce without a degree from Stanford

The best plain English explanation of MapReduce I’ve encountered (paraphrasing):

We want to count all the books in the library.  You count up shelf #1.  I count up shelf #2.  That’s map. Now we get together and add our individual counts.  That’s reduce.

For a deeper understanding, Wikipedia is a good place to start.

4. Distributed data generation is fueling big data growth

The reason we have data problems so big that we need large-scale distributed computing architecture to solve is that the creation of the data is also large-scale and distributed.  Most of us walk around carrying devices that are constantly pulsing all sorts of data into the cloud and beyond – our locations, our photos, our tweets, our status updates, our connections, even our heartbeats.

For every human-generated piece of data there’s likely associated machine-generated data.  And then there’s the metadata.  The data is abundant and it’s extremely valuable.

5. Machine learning is…awesome!

One of the key differentiators in big data analytics are the machine learning algorithms used to answer interesting questions and derive value from the 0s and 1s we’re furiously chewing up and spitting back out.

Some pretty cool examples:

  • Nest – a beautiful thermostat that learns how hot or cold you like your house so you never have to adjust it again (not technically big data, but fun nonetheless)
  • Gmail’s Bayesian spam filter – no more tempting emails from that pesky Nigerian prince!
  • Amazon’s product recommendations – sure, I’ll take a JavaScript book, a pair of Asics, and season 1 of Game of Thrones.  How do they know me so well?!
  • Varonis’ access control recommendations – ratchet down access based on highly accurate analytics.

If you’re interested in learning more about big data, join our webinar this Wednesday on Mastering Big Data.

photo credit: http://fav.me/d4vqn4w

The State of Data Protection [INFOGRAPHIC]

In the age of big data, businesses are creating, processing, storing, and sharing information at an alarming rate.  A significant amount of the data is highly sensitive or confidential and should be properly safeguarded.  It’s unnerving to think about the possibility of our own personal information sitting on servers, possibly unencrypted and open to everyone.

We hope that companies are complying with SOX, HIPAA, PCI, and other regulations but, as we know, hope is not a strategy – so we decided to take a hard look at the current state of data protection.

In March of 2012 we surveyed over 200 individuals in the IT community, asking about their current data protection practices and confidence levels, and how data protection practices correlate with data protection activities.

The results may surprise you. While over 80% reported that they store data belonging to customers, vendors, and other business partners, only 26% reported being very confident that data stored within their organization is protected.

Enjoy, share, embed our infographic and download the full report to learn which data protection activities truly matter.

The State of Data Protection

Embed this infographic on your own site

Copy and paste the code below into your blog post or web page:

<a href="http://blog.varonis.com/the-state-of-data-protection-infographic/"><img title="The State of the Data Protection - Infographic" src="http://www.varonis.com/assets/infographics/state-of-data-protection.png" alt="The State of Data Protection" width="600" height="2500" /></a>
<p><small>Like this infographic? Get more <a href="http://blog.varonis.com">data protection</a> tips from <a href="http://www.varonis.com/">Varonis</a>.</small></p>

New Case Study: Greenhill & Co.

Greenhill & Co., Inc. is a leading independent investment bank. The company was established in 1996 by Robert F. Greenhill, the former President of Morgan Stanley and former Chairman and Chief Executive Officer of Smith Barney.

As the CIO of Greenhill & Co., Inc., John Shaffer sought a data governance solution that could provide visibility into employee access rights, and identify potential issues. Additionally, the company needed a more efficient way to determine when content was moved or deleted, how it was being used, and by whom.

Greenhill & Co found that trying to manually manage and protect information for a company with global reach was often time-consuming, ineffective and error prone. The CIO and his team needed automated analysis of the organization’s permissions structure to more efficiently determine which files and folders required owners and who those owners were likely to be.  Identifying likely data owners required the ability to analyze actual access activity to identify likely data owners.

Further, Greenhill required a system to manage access and permissions to sensitive data.

“We liked DatAdvantage because it told us right away the access rights that certain folders had, which people had access to those folders, where the content was moving to, and if that access should be tightened.”

Click here to read the whole case study.

Data Governance Made Easier: Version 5.7 is now GA

Version 5.7 of the Varonis® Data Governance Suite® has been officially released. This version includes enhancements for Varonis DatAdvantage® and Varonis DataPrivilege® as well as a brand new product, DatAdvantage for Directory Services® . Almost all new features and enhancements came straight from our customers so we would like to say, thank you!

Some of the new features and enhancements to Varonis DatAdvantage® version 5.7 include:

  • Reports template wizard: customize the content and look of your reports
  • Flags, tags, and notes: create your own metadata on folders, files, groups, and users
  • Easy change reporting for data owners – automatically receive reports on changes to your folders & groups
  • Demarcation Report – report on folders that need owners based on their permissions and place in the hierarchy, and who those owners are likely to be
  • Support for HP IBRIX X9000 NAS Systems

DatAdvantage for Directory Services® provides new capabilities to audit and monitor Active Directory:

  • View domain and domain objects in DatAdvantage GUI
  • Analyze Organizational Unit’s and other AD objects
  • Augment auditing of changes to AD objects

These new functionalities are viewable in the DatAdvantage® GUI, providing a complete picture of your environment from a single interface.

In version 5.7 of Varonis DataPrivilege®, new features include:

  • Create folders from the DataPrivilege interface for easy collaboration
  • Dynamically assign first authorizer for permission and group membership requests (“Authorizer 0”)
  • “Locations” for groups, adding hierarchical organization to large group structures
  • New reports for data owners

Request a demo or request a 30-day free trial of Varonis® Data Governance Suite version 5.7. Customers may contact support@varonis.com for assistance with upgrading.

 

About


Varonis is the leading provider of comprehensive data governance software, with over 4500 installations worldwide. Varonis gives organizations visibility and control over their data, ensuring only the right users have access to the right data at all times. Learn More.

Twitter Feed


    Follow @Varonis on Twitter