Companies don't often wind up in the headlines for having their networks or endpoints stolen. Those things get infected or broken into, but they don't get stolen. Headlines are made — and reputations are destroyed — for stolen data.
You don't want that to happen to you. So, to protect your business and help it thrive, you must be able to see, track, and analyze every query, modification, deletion, or other data transaction.
That's not hard, but it may be painful. To achieve this, holistically, you have to understand the organization's secure development life cycle (SDLC) — at which point, you may find out that the "secure" part is just wishful thinking.
Top-Down Data Hygiene
To find out one way or the other, you start with the organization's most mission-critical app or apps. (If you're dealing with a Fortune 500 company with tens of thousands of databases, this may not be entirely practical, but you could at least consider an appropriate sampling.) From there, it's about understanding how your organization deals with data from beginning to end: starting at the development process and development servers, and proceeding to the test servers, to the quality assurance servers, to production — and so forth from there, presumably all the way back around to the beginning. What data do you have at each stage?
Now take all of that data and categorize it by risk and type, to rate the priority, severity, and criticality of each data point. And then ask whether the data changes as it moves from stage to stage, from server to server.
And what you should almost never find, but you may well find, is that production data — the most critical data you have at your company — goes unchanged. That's a big red flag. Next, you need to talk to stakeholders about why production data is being exposed outside of production — and they better have a darned good reason. Developers don't need to know Customer A's phone number. The business analytics team probably doesn't need Customer B's credit card number. And all a business-to-consumer marketing team probably cares about is how many 18- to 25-year-old males in Houston or how many 35- to 44-year-old females in New York City are buying the company's product. Occasionally, someone will need genuine production data, but you are usually better served by masking it.
Data Masking to Avoid Disaster
Data masking is a process by which copies of data are obfuscated (usually irreversibly) such that they still look realistic enough to remain workable and useful for whoever needs to play with them. "Sally Smith" becomes "Jessica Jones." Credit card number "4444-3333-2222-1111" becomes "4321-5555-6666-7777." And so forth. Data masking is essential to not just security, but its inherent pseudonymization is helpful for compliance with data-protection rubrics like the EU's General Data Protection Regulation.
If your most mission-critical apps are needlessly exposing your production data, you can stop right there because it's a given that the problem is systemic and that the rest of your apps are also a data liability. There is no "S" in your "SDLC." Time to roll up your sleeves and get to work.
If your mission-critical apps get a pass, however, then you may want to examine some of the so-called "lesser" apps that still have private data — and see if they are following the same processes. Sometimes, companies will appropriately prioritize around their perceived mission-critical apps by buying technology and implementing it around those apps and their data — but around nothing else. This creates a situation where the front door is bolted shut, but the back door is wide open; just think about how the data used by your "lesser" apps ends up getting copied dozens or hundreds of times across other apps. This is what happened in the Adobe mega-breach of 2013, in which attackers compromised more than 130 million customer accounts by gaining access to a poorly protected, set-to-be-decommissioned backup authentication system.
More recently, Uber confessed to covering up a 2016 breach affecting more than 57 million users. The breach happened after hackers compromised the GitHub credentials of a developer or two — indicating that Uber's attack surface was needlessly broad; the company allowed its developers to access and copy sensitive data that they likely didn't need.
Had Uber instead masked its production data accordingly, the hack could have potentially turned into a PR win — wherein the company announces that it had been hacked, but because of the safeguards they place on user data, they were able to prevent exposure while working with authorities to catch the bad guys.
That's the ride-hailing app I'd rather do business with. Even if they charge a bit more money, at least I would know that they treat my data like their crown jewels. Because that's what data is.